Image Super-Resolution Reconstruction Method Based on Lightweight CNN-Transformer

Authors

  • Wenqiang Xi
  • Zairila Juria Zainal Abidin
  • Cheng Peng
  • Tadiwa Elisha Nyamasvisva

DOI:

https://doi.org/10.54097/yqsddr19

Keywords:

Image Super-Resolution Reconstruction, Deep Learning, Transformer

Abstract

Existing Transformer-based image super-resolution reconstruction methods suffer from excessive parameters and high training costs. To address these issues, we propose a lightweight CNN-Transformer-based image super-resolution reconstruction method. A CNN-Transformer module is designed using weight sharing, and a channel attention module is used to fully fuse image information, which improves the reconstruction of local and global features of the image. Meanwhile, depth-wise separable convolutions are used and the covariance matrix of cross-channel self-attention is calculated, which effectively reduces the number of parameters in the Transformer and lowers the computational cost. Then, a High-Frequency Residual Block (HFRB) is introduced to further focus on the texture and detail information in the high-frequency range. Finally, the choice of activation function required for Transformer to generate self-attention is discussed. Analysis shows that the GELU activation function can better promote feature aggregation and improve network performance. Experiments show that the method in this paper can effectively reconstruct more texture and details of the image while maintaining lightweight.

Downloads

Download data is not yet available.

References

[1] A. Adler, Y. Hel-Or, and M. Elad, “A shrinkage learning approach for single image super-resolution with overcomplete representations,” in Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, Springer, 2010, pp. 622–635.

[2] J. Chen et al., “Transunet: Transformers make strong encoders for medical image segmentation,” ArXiv Prepr. ArXiv210204306, 2021.

[3] Z. Guo et al., “Super-resolution integrated building semantic segmentation for multi-source remote sensing imagery,” IEEE Access, vol. 7, pp. 99381–99397, 2019.

[4] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.

[5] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.

[6] Y. Mei, Y. Fan, and Y. Zhou, “Image super-resolution with non-local sparse attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 3517–3526.

[7] X. Wu, K. Zhang, Y. Hu, X. He, and X. Gao, “Multi-scale non-local attention network for image super-resolution,” Signal Process., vol. 218, p. 109362, 2024.

[8] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1637–1645.

[9] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3147–3155.

[10] [Z. Hui, X. Gao, Y. Yang, and X. Wang, “Lightweight image super-resolution with information multi-distillation network,” in Proceedings of the 27th acm international conference on multimedia, 2019, pp. 2024–2032.

[11] L. Zha, Y. Yang, Z. Lai, Z. Zhang, and J. Wen, “A lightweight dense connected approach with attention on single image super-resolution,” Electronics, vol. 10, no. 11, p. 1234, 2021.

[12] R. Lan, L. Sun, Z. Liu, H. Lu, C. Pang, and X. Luo, “MADNet: A fast and lightweight network for single-image super resolution,” IEEE Trans. Cybern., vol. 51, no. 3, pp. 1443–1453, 2020.

[13] C. Peng, P. Shu, X. Huang, Z. Fu, and X. Li, “LCRCA: image super-resolution using lightweight concatenated residual channel attention networks,” Appl. Intell., pp. 1–15, 2022.

[14] H. Feng, L. Wang, Y. Li, and A. Du, “LKASR: Large kernel attention for lightweight image super-resolution,” Knowl.-Based Syst., vol. 252, p. 109376, 2022.

[15] D. Gao and D. Zhou, “A very lightweight and efficient image super-resolution network,” Expert Syst. Appl., vol. 213, p. 118898, 2023.

[16] W. Wang, Y. Zhu, D. Ding, J. Li, and Y. Luo, “Multi-Scale Multi-Stage Single Image Super-Resolution Reconstruction Algorithm Based on Transformer,” in 2022 21st International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), IEEE, 2022, pp. 111–114.

[17] Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 457–466.

[18] J. Fang, H. Lin, X. Chen, and K. Zeng, “A hybrid network of cnn and transformer for lightweight image super-resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1103–1112.

[19] X. Li, J. Dong, J. Tang, and J. Pan, “Dlgsanet: lightweight dynamic local and global self-attention networks for image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12792–12801.

[20] Y. Zheng, S. Liu, H. Chen, and L. Bruzzone, “Hybrid FusionNet: A hybrid feature fusion framework for multi-source high-resolution remote sensing image classification,” IEEE Trans. Geosci. Remote Sens., 2024.

[21] H. Gu, L. Su, Y. Wang, W. Zhang, and C. Ran, “Efficient Channel-Temporal Attention for Boosting RF Fingerprinting,” IEEE Open J. Signal Process., 2024.

[22] A. Li, L. Zhang, Y. Liu, and C. Zhu, “Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12514–12524.

[23] E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA: IEEE, Jul. 2017, pp. 1122–1131. doi: 10.1109/CVPRW.2017.150.

[24] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.

[25] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, Springer, 2012, pp. 711–730.

[26] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, 2010.

[27] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5197–5206.

[28] Y. Matsui et al., “Sketch-based manga retrieval using manga109 dataset,” Multimed. Tools Appl., vol. 76, pp. 21811–21838, 2017.

[29] X. Gao, W. Lu, D. Tao, and X. Li, “Image quality assessment based on multiscale geometric analysis,” IEEE Trans. Image Process., vol. 18, no. 7, pp. 1409–1423, 2009.

Downloads

Published

15-07-2025

Issue

Section

Articles

How to Cite

Xi, W., Abidin, Z. J. Z., Peng, C., & Nyamasvisva, T. E. (2025). Image Super-Resolution Reconstruction Method Based on Lightweight CNN-Transformer. Computer Life, 13(2), 1-6. https://doi.org/10.54097/yqsddr19