A hybrid vit-cnn model of tongue recognition adapted for mobile use
Abstract
Tongue recognition plays a pivotal role in traditional Chinese medicine (TCM), and there have been many deep learning studies applied to tongue recognition, but there are fewer studies on lightweight tongue recognition, and the use of a lightweight network to apply tongue recognition to mobile has a pivotal role in the digitalisation and standardisation development of TCM, as well as monitoring and warning of people's health conditions, and a deep learning-based lightweight recognition method is the key to achieve mobile deployment. In this research paper, an efficient hybrid tongue image recognition network (SESAViT), a novel neural network integrating convolutional neural network and VIT, is proposed. We found that in the image recognition task, although VIT shows advantages in extracting global information, it is computationally intensive, so we simplify the computational VIT approach by using the sandwich layout, segmented attention module, making it possible to get rid of the limitation of VIT's multi-head self-attention mechanism, and we call the model SAViT.For the convolutional neural network part, we use the token mixer and channel mixer separation to achieve lightweight improvement. Our experiments on a homemade tongue dataset demonstrate the superiority of our method, with tongue colour, tongue moss features, tongue body features, tongue shape, dentate tongue and cleft tongue classification accuracies of 86.284%, 82.467%, 83.417%, 86.259%, 84.759% and 85.759%, respectively, which is much better than that of lightweight networks based on VIT and MobileNet v3 by about 3%.
Share and Cite
Article Metrics
References
- Li, Y., Cui, J., Liu, Y., Chen, K., Huang, L., & Liu, Y. (2021). Oral, Tongue-Coating Microbiota, and Metabolic Disorders: A Novel Area of Interactive Research. Frontiers in cardiovascular medicine, 8, 730203. https://doi.org/10.3389/fcvm.2021.730203
- Cui, J., Hou, S., Liu, B., Yang, M., Wei, L., Du, S., & Li, S. (2022). Species composition and overall diversity are significantly correlated between the tongue coating and gastric fluid microbiomes in gastritis patients. BMC medical genomics, 15(1), 60. https://doi.org/10.1186/s12920-022-01209-9
- Park, S. H., Shin, N. R., Yang, M., Bose, S., Kwon, O., Nam, D. H., Lee, J. H., Song, E. J., Nam, Y. D., & Kim, H. (2022). A Clinical Study on the Relationship Among Insomnia, Tongue Diagnosis, and Oral Microbiome. The American journal of Chinese medicine, 50(3), 773–797. https://doi.org/10.1142/S0192415X2250032X
- Kang, X., Lu, B., Xiao, P., Hua, Z., Shen, R., Wu, J., Wu, J., Wu, Z., Cheng, C., & Zhang, J. (2022). Microbial Characteristics of Common Tongue Coatings in Patients with Precancerous Lesions of the Upper Gastrointestinal Tract. Journal of healthcare engineering, 2022, 7598427. https://doi.org/10.1155/2022/7598427
- Lu, H., Ren, Z., Li, A., Zhang, H., Jiang, J., Xu, S., Luo, Q., Zhou, K., Sun, X., Zheng, S., & Li, L. (2016). Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma. Scientific reports, 6, 33142. https://doi.org/10.1038/srep33142
- Han, S., Chen, Y., Hu, J., & Ji, Z. (2014). Tongue images and tongue coating microbiome in patients with colorectal cancer. Microbial pathogenesis, 77, 1–6. https://doi.org/10.1016/j.micpath.2014.10.003
- Wang, X., Wang, X., Lou, Y., Liu, J., Huo, S., Pang, X., Wang, W., Wu, C., Chen, Y., Chen, Y., Chen, A., Bi, F., Xing, W., Deng, Q., Jia, L., & Chen, J. (2022). Constructing tongue coating recognition model using deep transfer learning to assist syndrome diagnosis and its potential in noninvasive ethnopharmacological evaluation. Journal of ethnopharmacology, 285, 114905. https://doi.org/10.1016/j.jep.2021.114905
- Guorui Sheng, Shuqi Sun, Chengxu Liu, and Yancun Yang. 2022. Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37, 12 (December 2022), 11465–11481. https://doi.org/10.1002/int.23050
- Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60, 84 - 90. https://doi.org/10.1145/3065386
- Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248-255. https://doi.org/10.1109/CVPR.2009.5206848
- He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.https://doi.org/10.1109/CVPR.2016.90
- Howard, A. G. , Zhu, M. , Chen, B. , Kalenichenko, D. , Wang, W. , & Weyand, T. , et al. (2017). Mobilenets: efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861
- Huang, G., Liu, Z., & Weinberger, K.Q. (2016). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269. https://doi.org /10.1109/CVPR.2017.243
- Tan, M., & Le, Q.V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv, abs/1905.11946.https://doi.org/10.48550/arXiv.1905.11946
- Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11966-11976. https://doi.org/10.1109/CVPR52688.2022.01167
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929. https://doi.org/10.48550/arXiv.2010.11929
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992-10002. https://doi.org/10.48550/arXiv.2103.14030
- Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., & Ye, Q. (2021). Conformer: Local Features Coupling Global Representations for Visual Recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 357-366. https://doi.org/10.1109/TPAMI.2023.3243048
- Dai, Z., Liu, H., Le, Q.V., & Tan, M. (2021). CoAtNet: Marrying Convolution and Attention for All Data Sizes. ArXiv, abs/2106.04803. https://doi.org/10.48550/arXiv.2106.04803