A hybrid vit-cnn model of tongue recognition adapted for mobile use

Miao Tan; Hongbin Li

doi:10.69610/j.gsr.20241231011

Journal of Globe Scientific Reports

Volume 6, 2024 - Issue 6

Submit Manuscript Journal Homepage

Latest Issue

Article Menu

A hybrid vit-cnn model of tongue recognition adapted for mobile use

by Miao Tan ^1,* and Hongbin Li ¹

Taiyuan Normal University

Author to whom correspondence should be addressed.

JGSR 2024 6(6):133; https://doi.org/10.69610/j.gsr.20241231011

Received: / Accepted: / Published Online: 31 December 2024

View Full-Text

Download PDF

Abstract

Tongue recognition plays a pivotal role in traditional Chinese medicine (TCM), and there have been many deep learning studies applied to tongue recognition, but there are fewer studies on lightweight tongue recognition, and the use of a lightweight network to apply tongue recognition to mobile has a pivotal role in the digitalisation and standardisation development of TCM, as well as monitoring and warning of people's health conditions, and a deep learning-based lightweight recognition method is the key to achieve mobile deployment. In this research paper, an efficient hybrid tongue image recognition network (SESAViT), a novel neural network integrating convolutional neural network and VIT, is proposed. We found that in the image recognition task, although VIT shows advantages in extracting global information, it is computationally intensive, so we simplify the computational VIT approach by using the sandwich layout, segmented attention module, making it possible to get rid of the limitation of VIT's multi-head self-attention mechanism, and we call the model SAViT.For the convolutional neural network part, we use the token mixer and channel mixer separation to achieve lightweight improvement. Our experiments on a homemade tongue dataset demonstrate the superiority of our method, with tongue colour, tongue moss features, tongue body features, tongue shape, dentate tongue and cleft tongue classification accuracies of 86.284%, 82.467%, 83.417%, 86.259%, 84.759% and 85.759%, respectively, which is much better than that of lightweight networks based on VIT and MobileNet v3 by about 3%.

Keywords: Tongue recognition; VIT; CNN; lightweight.;

Copyright: © 2024 by Tan and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (Creative Commons Attribution 4.0 International License). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

ACS Style

Tan, M.; Li, H. A hybrid vit-cnn model of tongue recognition adapted for mobile use. Journal of Globe Scientific Reports, 2024, 6, 133. doi:10.69610/j.gsr.20241231011

AMA Style

Tan M., Li H.. A hybrid vit-cnn model of tongue recognition adapted for mobile use. Journal of Globe Scientific Reports; 2024, 6(6):133. doi:10.69610/j.gsr.20241231011

Chicago/Turabian Style

Tan, Miao; Li, Hongbin 2024. "A hybrid vit-cnn model of tongue recognition adapted for mobile use" Journal of Globe Scientific Reports 6, no.6:133. doi:10.69610/j.gsr.20241231011

Article Metrics

Article Access Statistics

References

Li, Y., Cui, J., Liu, Y., Chen, K., Huang, L., & Liu, Y. (2021). Oral, Tongue-Coating Microbiota, and Metabolic Disorders: A Novel Area of Interactive Research. Frontiers in cardiovascular medicine, 8, 730203. https://doi.org/10.3389/fcvm.2021.730203
Cui, J., Hou, S., Liu, B., Yang, M., Wei, L., Du, S., & Li, S. (2022). Species composition and overall diversity are significantly correlated between the tongue coating and gastric fluid microbiomes in gastritis patients. BMC medical genomics, 15(1), 60. https://doi.org/10.1186/s12920-022-01209-9
Park, S. H., Shin, N. R., Yang, M., Bose, S., Kwon, O., Nam, D. H., Lee, J. H., Song, E. J., Nam, Y. D., & Kim, H. (2022). A Clinical Study on the Relationship Among Insomnia, Tongue Diagnosis, and Oral Microbiome. The American journal of Chinese medicine, 50(3), 773–797. https://doi.org/10.1142/S0192415X2250032X
Kang, X., Lu, B., Xiao, P., Hua, Z., Shen, R., Wu, J., Wu, J., Wu, Z., Cheng, C., & Zhang, J. (2022). Microbial Characteristics of Common Tongue Coatings in Patients with Precancerous Lesions of the Upper Gastrointestinal Tract. Journal of healthcare engineering, 2022, 7598427. https://doi.org/10.1155/2022/7598427
Lu, H., Ren, Z., Li, A., Zhang, H., Jiang, J., Xu, S., Luo, Q., Zhou, K., Sun, X., Zheng, S., & Li, L. (2016). Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma. Scientific reports, 6, 33142. https://doi.org/10.1038/srep33142
Han, S., Chen, Y., Hu, J., & Ji, Z. (2014). Tongue images and tongue coating microbiome in patients with colorectal cancer. Microbial pathogenesis, 77, 1–6. https://doi.org/10.1016/j.micpath.2014.10.003
Wang, X., Wang, X., Lou, Y., Liu, J., Huo, S., Pang, X., Wang, W., Wu, C., Chen, Y., Chen, Y., Chen, A., Bi, F., Xing, W., Deng, Q., Jia, L., & Chen, J. (2022). Constructing tongue coating recognition model using deep transfer learning to assist syndrome diagnosis and its potential in noninvasive ethnopharmacological evaluation. Journal of ethnopharmacology, 285, 114905. https://doi.org/10.1016/j.jep.2021.114905
Guorui Sheng, Shuqi Sun, Chengxu Liu, and Yancun Yang. 2022. Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37, 12 (December 2022), 11465–11481. https://doi.org/10.1002/int.23050
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60, 84 - 90. https://doi.org/10.1145/3065386
Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.https://doi.org/10.1109/CVPR.2016.90
Howard, A. G. , Zhu, M. , Chen, B. , Kalenichenko, D. , Wang, W. , & Weyand, T. , et al. (2017). Mobilenets: efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861
Huang, G., Liu, Z., & Weinberger, K.Q. (2016). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269. https://doi.org /10.1109/CVPR.2017.243
Tan, M., & Le, Q.V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv, abs/1905.11946.https://doi.org/10.48550/arXiv.1905.11946
Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11966-11976. https://doi.org/10.1109/CVPR52688.2022.01167
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992-10002. https://doi.org/10.48550/arXiv.2103.14030
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., & Ye, Q. (2021). Conformer: Local Features Coupling Global Representations for Visual Recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 357-366. https://doi.org/10.1109/TPAMI.2023.3243048
Dai, Z., Liu, H., Le, Q.V., & Tan, M. (2021). CoAtNet: Marrying Convolution and Attention for All Data Sizes. ArXiv, abs/2106.04803. https://doi.org/10.48550/arXiv.2106.04803

Article Overview

Article Versions

More by Authors Links

A hybrid vit-cnn model of tongue recognition adapted for mobile use

Abstract

Article Metrics

References

Article Overview

Article Versions

Related Links

More by Authors Links

A hybrid vit-cnn model of tongue recognition adapted for mobile use

Abstract

Share and Cite

Article Metrics

References