nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2026, 01, v.52 33-39
基于RepVGG的动态多尺度特征增强网络
基金项目(Foundation):
邮箱(Email):
DOI: 10.20149/j.cnki.issn1008-1739.2026.01.005
摘要:

针对轻量级RepVGG模型中静态卷积核与单向特征传递的缺陷,提出动态多尺度特征增强网络——RepVGG-Dynamic。通过动态多分支卷积(Dynamic Multi-Branch Convolution, DMBC)融合多尺度空洞卷积分支,采用门控网络生成输入自适应权重,设计跨阶段特征融合(Cross Stage Feature Fusion, CSFF)的双向特征金字塔结构实现多层次特征交互,引入自适应动态激活(Adaptive Dynamic Activation, ADA)函数,基于特征统计量动态调节阈值与斜率。在Imagenette2-320与Oxford-102 Flowers数据集上的实验表明,所提模型以93.8%/93.3%Top-1精度超越MobileOne-S4(89.6%/93.2%)与RepVGG-A0(91.3%/92.2%),仅需9.12 M参数与1.61 G浮点运算数(Floating Point Operations, FLOPs)。消融实验进一步验证DMBC在Imagenette2-320上贡献最大独立增益(+1.4%),CSFF在Oxford-102 Flowers上展现最强细粒度提升(+0.7%),ADA在双数据集实现稳定增益(Imagenette2-320+0.3%,Oxford-102 Flowers+0.2%)。为边缘设备提供了动态感知与硬件效率的平衡解决方案。

Abstract:

For the defects of static convolutional kernels and single-directional feature propagation in lightweight RepVGG models, a dynamic multi-scale feature enhancement network—RepVGG-Dynamic is proposed. Through Dynamic Multi-Branch Convolution(DMBC), the model is integrated with multiscale dilated convolutional branches employing a gated network to generate input-adaptive weights. Then, a bidirectional feature pyramid structure with Cross Stage Feature Fusion(CSFF) is designed to achieve multi-level feature interaction. Additionally, an Adaptive Dynamic Activation(ADA) function is introduced to dynamically adjust thresholds and slopes based on feature statistics. The experiments on the Imagenette2-320 and Oxford-102 Flowers datasets demonstrate that the proposed model achieves Top-1 accuracy of 93.8%/93.3%, outperforming MobileOne-S4(89.6%/93.2%) and RepVGG-A0(91.3%/92.2%), with only 9.12 M parameters and 1.61 G Floating Piont Operations(FLOPs). Ablation studies further validate that DMBC contributes the highest independent gain on Imagenette2-320(+1.4%), while CSFF exhibits the strongest fine-grained improvement on Oxford-102 Flowers(+0.7%), and ADA achieves consistent gains on both datasets(Imagenette2-320 +0.3%, Oxford-102 Flowers +0.2%). This provides a balanced solution of dynamic perception and hardware efficiency for edge devices.

参考文献

[1] LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Montreal:IEEE,2021:10012-10022.

[2] TOUVRON H,CORD M,DOUZE M,et al.Training Data-efficient Image Transformers & Distillation Through Attention[C]//Proceedings of the 38th International Conference on Machine Learning.London:PMLR,2021:10347-10357.

[3] VASU P K A,GABRIEL J,ZHU J,et al.MobileOne:An Improved One millisecond Mobile Backbone[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:7907-7917.

[4] MEHTA S,RASTEGARI M.MobileViT:Light-weight,General-purpose,and Mobile-friendly Vision Transformer[EB/OL].(2021-10-05)[2025-02-18].https://arxiv.org/abs/2110.02178.

[5] DING X H,ZHANG X Y,MA N N,et al.RepVGG:Making VGG-style ConvNets Great Again[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Nashville:IEEE,2021:13728 - 13737.

[6] YANG B,LE Q V,BENDER G,et al.CondConv:Conditionally Parameterized Convolutions for Efficient Inference[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.New York:ACM,2019:1307-1318.

[7] CHEN Y P,DAI X Y,LIU M C,et al.Dynamic Convolution:Attention over Convolution Kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE,2020:11030-11039.

[8] LIN T Y,DOLL??R P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:2117-2125.

[9] LIU S,QI L,QIN H F,et al.Path Aggregation Network for Instance Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:8759-8768.

[10] HU J,SHEN L,ALBANIE S.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023.

[11] CHEN Y P,DAI X Y,LIU M C,et al.Dynamic ReLU[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2020:351-367.

[12] HOWARD J.Imagenette2-320[DB/OL].(2019-12-06)[2025-01-20].https://github.com/fastai/imagenette.

[13] NILSBACK M E,ZISSERMAN A.Automated Flower Classification over a Large Number of Classes[C]//2008 Sixth Indian Conference on Computer Vision,Graphics & Image Processing.Bhubaneswar:IEEE,2008:722-729.

[14] LOSCHILOV I,HUTTER F.Decoupled Weight Decay Regularization[EB/OL].(2017-11-14)[2025-02-16].https://arxiv.org/abs/1711.05101.

[15] CUBUK E D,ZOPH B,SHLENS J,et al.Randaugment:Practical Automated Data Augmentation with a Reduced Search Space[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Seattle:IEEE,2020:702-703.

[16] YUN S,HAN D,OH S J,et al.CutMix:Regularization Strategy to Train Strong Classifiers with Localizable Features[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV).Seoul:IEEE,2019:6023-6032.

[17] LARSSON G,MAIRE M,SHAKHNAROVICH G.FractalNet:Ultra-deep Neural Networks Without Residuals[EB/OL].(2016-05-24)[2025-03-18].https://arxiv.org/abs/1605.07648.

[18] YU W,LUO M,ZHOU P,et al.MetaFormer is Actually What You Need for Vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2022:10819-10829.

基本信息:

DOI:10.20149/j.cnki.issn1008-1739.2026.01.005

中图分类号:TP391.41;TP183

引用信息:

[1]陈晓光.基于RepVGG的动态多尺度特征增强网络[J].计算机与网络,2026,52(01):33-39.DOI:10.20149/j.cnki.issn1008-1739.2026.01.005.

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文