Input keywords, title, abstract, author, affiliation etc..
Journal Article An open access journal
Journal Article

A hybrid vit-cnn model of tongue recognition adapted for mobile use

by Miao Tan 1,*  and  Hongbin Li 1
1
Taiyuan Normal University
*
Author to whom correspondence should be addressed.
Received: / Accepted: / Published Online: 31 December 2024

Abstract

Tongue recognition plays a pivotal role in traditional Chinese medicine (TCM), and there have been many deep learning studies applied to tongue recognition, but there are fewer studies on lightweight tongue recognition, and the use of a lightweight network to apply tongue recognition to mobile has a pivotal role in the digitalisation and standardisation development of TCM, as well as monitoring and warning of people's health conditions, and a deep learning-based lightweight recognition method is the key to achieve mobile deployment. In this research paper, an efficient hybrid tongue image recognition network (SESAViT), a novel neural network integrating convolutional neural network and VIT, is proposed. We found that in the image recognition task, although VIT shows advantages in extracting global information, it is computationally intensive, so we simplify the computational VIT approach by using the sandwich layout, segmented attention module, making it possible to get rid of the limitation of VIT's multi-head self-attention mechanism, and we call the model SAViT.For the convolutional neural network part, we use the token mixer and channel mixer separation to achieve lightweight improvement. Our experiments on a homemade tongue dataset demonstrate the superiority of our method, with tongue colour, tongue moss features, tongue body features, tongue shape, dentate tongue and cleft tongue classification accuracies of 86.284%, 82.467%, 83.417%, 86.259%, 84.759% and 85.759%, respectively, which is much better than that of lightweight networks based on VIT and MobileNet v3 by about 3%.


Copyright: © 2024 by Tan and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (Creative Commons Attribution 4.0 International License). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Share and Cite

ACS Style
Tan, M.; Li, H. A hybrid vit-cnn model of tongue recognition adapted for mobile use. Journal of Globe Scientific Reports, 2024, 6, 133. doi:10.69610/j.gsr.20241231011
AMA Style
Tan M., Li H.. A hybrid vit-cnn model of tongue recognition adapted for mobile use. Journal of Globe Scientific Reports; 2024, 6(6):133. doi:10.69610/j.gsr.20241231011
Chicago/Turabian Style
Tan, Miao; Li, Hongbin 2024. "A hybrid vit-cnn model of tongue recognition adapted for mobile use" Journal of Globe Scientific Reports 6, no.6:133. doi:10.69610/j.gsr.20241231011

Article Metrics

Article Access Statistics

References

  1. Li, Y., Cui, J., Liu, Y., Chen, K., Huang, L., & Liu, Y. (2021). Oral, Tongue-Coating Microbiota, and Metabolic Disorders: A Novel Area of Interactive Research. Frontiers in cardiovascular medicine, 8, 730203. https://doi.org/10.3389/fcvm.2021.730203
  2. Cui, J., Hou, S., Liu, B., Yang, M., Wei, L., Du, S., & Li, S. (2022). Species composition and overall diversity are significantly correlated between the tongue coating and gastric fluid microbiomes in gastritis patients. BMC medical genomics, 15(1), 60. https://doi.org/10.1186/s12920-022-01209-9
  3. Park, S. H., Shin, N. R., Yang, M., Bose, S., Kwon, O., Nam, D. H., Lee, J. H., Song, E. J., Nam, Y. D., & Kim, H. (2022). A Clinical Study on the Relationship Among Insomnia, Tongue Diagnosis, and Oral Microbiome. The American journal of Chinese medicine, 50(3), 773–797. https://doi.org/10.1142/S0192415X2250032X
  4. Kang, X., Lu, B., Xiao, P., Hua, Z., Shen, R., Wu, J., Wu, J., Wu, Z., Cheng, C., & Zhang, J. (2022). Microbial Characteristics of Common Tongue Coatings in Patients with Precancerous Lesions of the Upper Gastrointestinal Tract. Journal of healthcare engineering, 2022, 7598427. https://doi.org/10.1155/2022/7598427
  5. Lu, H., Ren, Z., Li, A., Zhang, H., Jiang, J., Xu, S., Luo, Q., Zhou, K., Sun, X., Zheng, S., & Li, L. (2016). Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma. Scientific reports, 6, 33142. https://doi.org/10.1038/srep33142
  6. Han, S., Chen, Y., Hu, J., & Ji, Z. (2014). Tongue images and tongue coating microbiome in patients with colorectal cancer. Microbial pathogenesis, 77, 1–6. https://doi.org/10.1016/j.micpath.2014.10.003
  7. Wang, X., Wang, X., Lou, Y., Liu, J., Huo, S., Pang, X., Wang, W., Wu, C., Chen, Y., Chen, Y., Chen, A., Bi, F., Xing, W., Deng, Q., Jia, L., & Chen, J. (2022). Constructing tongue coating recognition model using deep transfer learning to assist syndrome diagnosis and its potential in noninvasive ethnopharmacological evaluation. Journal of ethnopharmacology, 285, 114905. https://doi.org/10.1016/j.jep.2021.114905
  8. Guorui Sheng, Shuqi Sun, Chengxu Liu, and Yancun Yang. 2022. Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37, 12 (December 2022), 11465–11481. https://doi.org/10.1002/int.23050
  9. Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60, 84 - 90. https://doi.org/10.1145/3065386
  10. Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848
  11. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.https://doi.org/10.1109/CVPR.2016.90
  12. Howard, A. G. , Zhu, M. , Chen, B. , Kalenichenko, D. , Wang, W. , & Weyand, T. , et al. (2017). Mobilenets: efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861
  13. Huang, G., Liu, Z., & Weinberger, K.Q. (2016). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269. https://doi.org /10.1109/CVPR.2017.243
  14. Tan, M., & Le, Q.V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv, abs/1905.11946.https://doi.org/10.48550/arXiv.1905.11946
  15. Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11966-11976. https://doi.org/10.1109/CVPR52688.2022.01167
  16. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929. https://doi.org/10.48550/arXiv.2010.11929
  17. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992-10002. https://doi.org/10.48550/arXiv.2103.14030
  18. Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., & Ye, Q. (2021). Conformer: Local Features Coupling Global Representations for Visual Recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 357-366. https://doi.org/10.1109/TPAMI.2023.3243048
  19. Dai, Z., Liu, H., Le, Q.V., & Tan, M. (2021). CoAtNet: Marrying Convolution and Attention for All Data Sizes. ArXiv, abs/2106.04803. https://doi.org/10.48550/arXiv.2106.04803