最近有小伙伴询问SNPE是否支持把量化训练得到的参数写入到量化的DLC中。
重写DLC量化参数的功能很早就加入到SNPE了,可能是文档的不完善,很多小伙伴至今不知道怎么用。
这里,我就简单介绍一下重写DLC量化参数的办法和需要注意的点。
- 如何将TF中FakeQuant的min/max参数写入DLC
- 如何通过encoding.json文件写入DLC
- snpe-dlc-quantize 如何操作写入的量化参数
本文基于 SNPE1.59.0
一、将PB中FakeQuant的参数写入DLC
TensorFlow量化感知训练之后可以得到如下图所示的网络,可以看出网络中多了FakeQuantWithMinMaxVars节点,这些节点会保存量化训练之后得到的min和max.
Figure-1 带有FakeQuant的PB
- Conv2D的weights对应的量化参数是在FakeQuant-1节点中;
- min = -6.410740375518799
- max = 7.991758823394775
- ReLu节点的activation的量化参数在FakeQuant-2节点中;
- min = 0
- max = 66.1359634399414
- SNPE的工具是支持将这些量化参数(encoding)写入到量化的DLC的。
1. 将带有FakeQaunt节点的PB转成成DLC.
snpe-tensorflow-to-dlc --input_network test.pb --input_dim noisy_image 1,512,512,3 --out_node downsample_0/conv2d_0/Relu --output_path test.dlc
- 这个过程中SNPE也会将FakeQuant的min、max、bitwidth同时写入到浮点的DLC中,参考如下代码。
#<snpe-sdk>\lib\python\qti\aisw\converters\tensorflow\layers\fake_quant.py
class FakeQuantLayerBuilder(LayerBuilder):
...
# save quantization encodings for previous layer. ie quantization is done on the outputs of the previous
# layer. node_x -> fakequant_node -> node_y
# save quantization encodings for next layer. ie quantization is done on the const inputs of the next
# layer. weights_node -> fakequant_node -> node_x
....
- 浮点DLC中保存的是FakeQuant的输出值,也就是经过量化反量化的weights值,参考如下代码。
#<snpe-sdk>\lib\python\qti\aisw\converters\tensorflow\layers\convolution.py
def get_weights_tensor(self, graph_helper, weights_source_op):
if graph_helper.check_tensor_const_origin(weights_source_op.outputs[0])[0]:
return graph_helper.evaluate_tensor_output(weights_source_op.outputs[0])
return None
2. snpe-dlc-quantize --override_params 选项来使用这些encoding.
- 不采用 --override_params,SNPE会进行后量化。
snpe-dlc-quantize --input_dlc test.dlc --input_list raw_list.txt --output_dlc test_quantized.dlc
# 将DLC的信息拿出来
snpe-dlc-info -i test_quantized.dlc > test_quantized.dlc.info.txt
- 采用 --override_params,SNPE会进行后量化,然后将有FakeQuant encoding的层的参数重写到量化后的DLC.
snpe-dlc-quantize --input_dlc test.dlc --input_list raw_list.txt --output_dlc test_quantized_override.dlc --override_params
snpe-dlc-info -i test_quantized_override.dlc > test_quantized_override.dlc.info.txt
3. 检查量化DLC的encoding
A. Conv2D节点weights Encoding
Figure2-1 Conv2d Weights Encoding
- 绿色框住的是使用--override_params之后的minmax和PB Fakequant_1的minmax基本一致。
- 细微的差异是因为做了零点校准,后面会在解释。
B. Relu节点的activation Encoding
Figure2-2 Relu Activation Encoding
- 绿色框住的是使用--override_params之后的minmax,和PB Fakequant_2 的minmax一致。
二、通过encoding.json重写入DLC量化参数
1. SNPE工具有 quantization_overrides选项
SNPE的转化工具有 --quantization_overrides这个选项,它是用来输入模型量化需要的量化参数的,量化参数以JSON文件传入。
╰─ snpe-tensorflow-to-dlc -h
╰─ snpe-pytorch-to-dlc -h
....
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters
to use for quantization. These will override any
quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal
quantization process. Format defined as per AIMET
specification.
2. 需要的JSON文件的格式如下所示:
- "activation_encodings" 里面存放的是activation Tensor的量化参数,tensor名字作为name
- "param_encodings" 里面存放的是weights和bias tensor 的量化参数
{
"activation_encodings": {
"inference/coefficients/splat_cond/conv1/BiasAdd:0": [
{
"bitwidth": 16,
"is_symmetric": "False",
"max": 263.1574777999113,
"min": -3.2438893875886934,
"offset": -798,
"scale": 0.0040650242952239264
}
],
"inference/coefficients/splat_cond/conv1/LeakyRelu:0": [
{
"bitwidth": 16,
"is_symmetric": "False",
"max": 176.4537167516027,
"min": -0.6377291712488437,
"offset": -236,
"scale": 0.002702242251054422
}
],
"inference/coefficients/splat_cond/res1_1/conv1/BiasAdd:0": [
{
"bitwidth": 16,
"is_symmetric": "False",
"max": 297.32563562059215,
"min": -2.0971243891734557,
"offset": -459,
"scale": 0.004568898451358291
}
]
},
"param_encodings": {
"inference/coefficients/splat_cond/conv1/weights/read:0": [
{
"bitwidth": 8,
"is_symmetric": "False",
"max": 0.7084210564108456,
"min": -0.7028865169076359,
"offset": -127,
"scale": 0.0055345395032097315
}
],
"inference/coefficients/splat_cond/res1_1/conv1/weights/read:0": [
{
"bitwidth": 8,
"is_symmetric": "False",
"max": 1.921162235035616,
"min": -1.4808958895066204,
"offset": -111,
"scale": 0.013341404409969554
}
],
"inference/coefficients/splat_cond/res1_1/conv2/weights/read:0": [
{
"bitwidth": 8,
"is_symmetric": "False",
"max": 1.702315330505371,
"min": -2.4318790435791016,
"offset": -150,
"scale": 0.01621252695719401
}
]
}
}
3. 使用方法
这里我们采用一个没有FakeNode的test2.pb示例,使用上面的encoding.json.
Figure3 test2.pb
1) snpe-tensorflow-to-dlc 加上--quantization_overrides encoding.json.
╰─ snpe-tensorflow-to-dlc --input_network test2.pb --quantization_overrides encoding.json --input_dim Placeholder 1,256,256,4 --out_node inference/coefficients/splat_cond/res1_1/conv1/BiasAdd --output_path test2.dlc
....
2022-02-10 19:52:09,963 - 214 - INFO - Processing user provided quantization encodings:
2022-02-10 19:52:10,008 - 214 - INFO - INFO_ALL_BUILDING_NETWORK:
2) snpe-dlc-quantize 加上 --override_params
╰─ snpe-dlc-quantize --input_dlc test2.dlc --input_list net_run_test2/raw_list.txt --output_dlc test_quantized_json.dlc --override_params --act_bitwidth 16
[INFO] Setting activation for layer: inference/coefficients/splat_cond/conv1/Conv2D and buffer: inference/coefficients/splat_cond/conv1/BiasAdd:0
[INFO] bw: 16, min: -3.243889, max: 263.157471, delta: 0.004065, offset: -798.000000
[INFO] Setting activation for layer: inference/coefficients/splat_cond/conv1/LeakyRelu and buffer: inference/coefficients/splat_cond/conv1/LeakyRelu:0
[INFO] bw: 16, min: -0.637729, max: 176.453720, delta: 0.002702, offset: -236.000000
[INFO] Setting activation for layer: inference/coefficients/splat_cond/res1_1/conv1/Conv2D and buffer: inference/coefficients/splat_cond/res1_1/conv1/BiasAdd:0
[INFO] bw: 16, min: -2.097124, max: 297.325623, delta: 0.004569, offset: -459.000000
[INFO] Writing quantized model to: test_quantized_json.dlc
[INFO] DebugLog shutting down.
查看DLC info可以看到量化参数和JSON中的基本一致,但是小数位后面的不一致可能是因为数据存储时截取位数不一样导致的。
- ONNX, Pytorch,TFlite模型在转DLC时也是加上以上两个参数。
- encoding.json的生成最直接的就是使用AIMET,有接口生成,也可以自己按照以上格式生成。
三、snpe-dlc-quantize 量化参数的处理
1. 如何由min max计算scale offset.
在第一种PB中的FakeQuant中的encoding写入DLC的过程中,并没有提供scale,offset和issymmetric.
这个时候会做如下计算:
aimet/TfQuantizer.cpp at develop · quic/aimet · GitHub
{
double num_steps = pow(2, bw) - 1;
// Make sure zero value is within the range
double new_min = std::min(0.0, stats.min);
double new_max = std::max(0.0, stats.max);
// When the min and max are too close together, nudge the maximum to meet the
// minimum range requirement
// This also handles the case where min==max==0 to avoid division by zero
new_max = std::max(new_max, new_min + MIN_RANGE);
encoding.delta = (new_max - new_min) / num_steps;
if (new_min < 0 && new_max > 0)
{
// Need to make sure 0-value is exactly quantizable
// Quantization of q into b is given by:
// b = q / delta - offset, where
// delta = (max - min)/#steps
// offset = min / delta
// For q = 0: b = -min / delta
// Find the closest round b, and set q=0 for it
double b_zero = round(-new_min / encoding.delta);
b_zero = std::min(num_steps, std::max(0.0, b_zero)); // just to be safe
encoding.offset = -b_zero;
}
else
{
// One of min or max is guaranteed to be zero, so 0 is exactly quantizable already
encoding.offset = round(new_min / encoding.delta);
}
// Calculate 'min' and 'max' based on 'delta' and 'offset'.
// Note this min and max can vary from the one in 'stats'. This min and max
// can really be represented with the integer offset.
encoding.min = encoding.delta * encoding.offset;
// We want to calculate: max = delta * num_steps + min.
// To avoid numerical accuracy issues on Linaro, we simplify the math.
encoding.max = new_max - new_min + encoding.min;
encoding.bw = bw;
}
可以看出,对于min、max做了调整,这个调整主要是为了0点对齐。
2. use_symmetric_quantize_weights
snpe-dlc-quantize有个选项是use_symmetric_quantize_weights。
[ --use_symmetric_quantize_weights ]
Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
如果你的encoding里面的is_symmetric是false,但是你也同时使用了这个参数,会报错。所以如果要同时使用这个参数,那么encoding中的is_symmetric需要设置成True.
snpe-dlc-quantize --input_dlc test2.dlc --input_list net_run_test2/raw_list.txt --output_dlc test_quantized_json.dlc --override_params --act_bitwidth 16 --debug3 --use_symmetric_quantize_weights
[ERROR] Requested symmetric weights but instead got is_symmetric==False.
如果encoding里面的min、max并不是对称的,那么及时设置成True,计算也会强行min = -max = -max(abs(min),abs(max)).
提醒大家一下,针对888,模型采用对称weights量化训练,HTP的精度会更好一点。
总结
到此为止,我们了解了如何将DLC中的量化参数用量化训练得到的参数覆写。
期待和大家进一步交流。
参考:
- <snpe-sdk>\lib\python\qti\aisw\converters\tensorflow\layers\fake_quant.py
- <snpe-sdk>\lib\python\qti\aisw\converters\tensorflow\layers\convolution.py
- aimet/ModelOptimizations/DlQuantization/src at develop · quic/aimet · GitHub
其他相关内容:
第二篇【上手SNPE(2)使用手机推理inceptionV3】
作者:Wenhao