Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation
Keywords:
edible mushrooms, picking, instance segmentation, deep learning, algorithmAbstract
Edible mushrooms are rich in nutrients; however, harvesting mainly relies on manual labor. Coarse localization of each mushroom is necessary to enable a robotic arm to accurately pick edible mushrooms. Previous studies used detection algorithms that did not consider mushroom pixel-level information. When these algorithms are combined with a depth map, the information is lost. Moreover, in instance segmentation algorithms, convolutional neural network (CNN)-based methods are lightweight, and the extracted features are not correlated. To guarantee real-time location detection and improve the accuracy of mushroom segmentation, this study proposed a new spatial-channel transformer network model based on Mask-CNN (SCT-Mask-RCNN). The fusion of Mask-RCNN with the self-attention mechanism extracts the global correlation outcomes of image features from the channel and spatial dimensions. Subsequently, Mask-RCNN was used to maintain a lightweight structure and extract local features using a spatial pooling pyramidal structure to achieve multiscale local feature fusion and improve detection accuracy. The results showed that the SCT-Mask-RCNN method achieved a segmentation accuracy of 0.750 on segm_Precision_mAP and detection accuracy of 0.638 on Bbox_Precision_mAP. Compared to existing methods, the proposed method improved the accuracy of the evaluation metrics Bbox_Precision_mAP and segm_Precision_mAP by over 2% and 5%, respectively. Key words: edible mushrooms; picking; instance segmentation; deep learning; algorithm DOI: 10.25165/j.ijabe.20241704.8987 Citation: Wang J L, Song W D, Zheng W G, Feng Q C, Wang M F, Zhao C J. Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation. Int J Agric & Biol Eng, 2024; 17(4): 227–235.References
[1] Wang M, Zhao R. A review on nutritional advantages of edible mushrooms and its industrialization development situation in protein meat analogues. Journal of Future Foods, 2023; 3(1): 1–7.
[2] Li C, Xu S. Edible mushroom industry in China: Current state and perspectives. Applied Microbiology and Biotechnology, 2022; 106(11): 3949–3955.
[3] Retsinas G, Efthymiou N, Anagnostopoulou D, Maragos P. Mushroom detection and three dimensional pose estimation from multi-view point clouds. Sensors, 2023; 23(7): 3576.
[4] Hua X, Li H, Zeng J, Han C, Chen T, Tang L, et al. A review of target recognition technology for fruit picking robots: from digital image processing to deep learning. Applied Sciences, 2023; 13(7): 4160.
[5] Qi X, Dong J, Lan Y, Zhu H. Method for identifying litchi picking position based on YOLOv5 and PSPNet. Remote Sensing, 2022; 14(9): 2004.
[6] Dean Z, Liu X Y, Chen Y, Jin J, Jia W K, Hu C L. Image recognition at night for apple picking robot. Transactions of the CSAM, 2015; 46(3): 15–22.
[7] Xu C, Lu Y, Jiang H, Liu S, Ma Y, Zhao T. Counting crowded soybean pods based on deformable attention recursive feature pyramid. Agronomy, 2023; 13(6): 1507.
[8] Yang C H, Xiong L Y, Wang Z, Wang Y, Shi G, Kuremot T, et al. Integrated detection of citrus fruits and branches using a convolutional neural network. Comput Electron in Agric, 2020; 174: 105469.
[9] Chen W, Lu S, Liu B, Li G, Qian T. Detecting citrus in orchard environment by using improved YOLOv4. Scientific Programming. 2020; 2020: 1–3.
[10] Chen P, Li W, Yao S, Ma C, Zhang J, Wang B, et al. Recognition and counting of wheat mites in wheat fields by a three-step deep learning method. Neurocomputing, 2021; 437: 21–30.
[11] Li R, Wang R J, Zhang J, Xie C J, Liu L, Wang F Y, et al. An effective data augmentation strategy for CNN-based pest localization and recognition in the field. IEEE Access, 2019; 7: 160274–160283.
[12] Liu T, Chen W, Wu W, Sun C M, Guo W S, Zhu X K. Detection of aphids in wheat fields using a computer vision technique. Biosystems Engineering, 2016; 141: 82–93.
[13] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 2017; pp.2961-2969.
[14] Huang Z J, Huang L C, Gong Y C, Huang C, Wang X G. Mask scoring R-CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019; pp.6409-6418.
[15] Sun C Z, Hu X M, Yu T. Structural design of agaricus bisporus picking robot based on cartesian coordinate system. Electrical Engineering and Computer Science (EECS), 2019; 2: 103–106.
[16] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W Y, Dollár P. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp.4015–4026.
[17] Cai Z Y, Jian Y, Zhang Z Y, Jin C Q, Da F P. SST-ReversibleNet: Reversible-prior-based spectral-spatial transformer for efficient hyperspectral image reconstruction. Arxiv preprint, 2023; arxiv: 2305.04054.
[18] Cai Z Y, Li C Y, Yu Y, Jin C Q, Da F P. Momentum accelerated unfolding network with spectral-spatial prior for computational spectral imaging. Applied Soft Computing, 2024; Feb 21: 111420.
[19] Chen K, Pang J M, Wang J Q, Xiong Y, Li X X, Sun S Y, et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019; pp.4974–4983.
[20] Yang S Z, Huang J, Yu X Y, Yu T. Research on a segmentation and location algorithm based on mask RCNN for agaricus bisporus. In 2022 2nd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), IEEE, 2022; pp.717–721.
[21] Cong P C, Feng H, Lv K F, Zhou J C, Li S D. MYOLO: a lightweight fresh shiitake mushroom detection model based on YOLOv3. Agriculture, 2023; 13(2): 392.
[22] Hafiz A M, Bhat G M. A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval, 2020; 9(3): 171–89.
[23] Romera-Paredes B, Torr P H. Recurrent instance segmentation. In Proceedings of 14th European Conference on Computer Vision–ECCV 2016, Amsterdam, The Netherlands, 2016; pp.312–329.
[24] Arnab A, Torr PH. Pixelwise instance segmentation with a dynamically instantiated network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017; pp.441–450.
[25] Lee Y, Park J. Centermask: Real-time anchor-free instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.13906–13915.
[26] Cai Z W, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019; 43(5): 1483–1498.
[27] Bolya D, Zhou C, Xiao F, Lee Y J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019; pp.9157–9166.
[28] Chen H, Sun K Y, Tian Z, Shen C H, Huang Y M, Yan Y L. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.8573–8581.
[29] Ying H, Huang Z, Liu S, Shao T J, Zhou K. Embedmask: Embedding coupling for one-stage instance segmentation. Arxiv preprint, 2019; arxiv: 1912.01954.
[30] Wang X L, Zhang R F, Kong T, Li L, Shen C H. Solov2: Dynamic and fast instance segmentation. Advances in Neural information Processing Systems, 2020; 33: 17721–17732.
[31] Shojaiee F, Baleghi Y. EFASPP U-Net for semantic segmentation of night traffic scenes using fusion of visible and thermal images. Engineering Applications of Artificial Intelligence, 2023; 117: 105627.
[32] Kaur A, Goyal P, Rajhans R, Agarwal L, Goyal N. Fusion of multivariate time series meteorological and static soil data for multistage crop yield prediction using multi-head self-attention network. Expert Systems with Applications, 2023; 226: 120098.
[33] Yang Q L, Ye Y, Gu L C, Wu Y T. MSFCA-net: A multi-scale feature convolutional attention network for segmenting crops and weeds in the field. Agriculture, 2023; 13(6): 1176.
[34] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017; 30: 1–11.
[35] Gillioz A, Casas J, Mugellini E, Abou Khaled O. Overview of the Transformer-based Models for NLP Tasks. In 15th Conference on Computer Science and Information Systems (FedCSIS), IEEE, 2020; pp.179–183.
[36] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arxiv preprint arxiv: 2010.11929. 2020 Oct 22.
[37] Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp.10012–10022.
[38] Bao W X, Xie W J, Hu G S, Yang X J, Su B B. Wheat ear counting method in UAV images based on TPH-YOLO. Transactions of the CSAE, 2023; 39(1): 155–161. (in Chinese)
[39] Xu Y L, Kong S L, Chen Q Y, Gao Z Y, Li C X. Model for identifying strong generalization apple leaf disease using transformer. Transactions of the CSAE, 2022; 38(16): 198–206. (in Chinese)
[40] Wang C, Wu X H, Zhang Y Q, Wang W J. Recognizing weeds in maize fields using shifted window Transformer network. Transactions of the CSAE, 2022; 38(15): 133–42. (in Chinese)
[41] Fu L L, Huang H, Wang H, Huang S C, Chen D. Classification of maize growth stages using the Swin transformer model. Transactions of the CSAE, 2022; 38(14): 191–200.
[42] Zhu D L, Yu M S, Liang M F. Real-time instance segmentation of maize ears using SwinT-YOLACT. Transactions of the CSAE, 2023; 39(14): 164–172. (in Chinese)
[43] Liu X, Yi S, Li L, Cheng X H, Wang C. Semantic segmentation of terrace image regions based on lightweight CNN-transformer hybrid networks. Transactions of the CSAE, 2023; 39(13): 171–181. (in Chinese)
[44] Fang Y X, Yang S S, Wang X G, Li Y, Fang C, Shan Y, et al. Instances as queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp.6910–6919.
[45] Kirillov A, Wu Y, He K, Girshick R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.9799–9808.
[46] Cai Z Y, Jin C, Da F. DMDC: Dynamic-mask-based dual camera design for snapshot Hyperspectral Imaging. arxiv preprint, 2023; arxiv: 2308.01541.
[2] Li C, Xu S. Edible mushroom industry in China: Current state and perspectives. Applied Microbiology and Biotechnology, 2022; 106(11): 3949–3955.
[3] Retsinas G, Efthymiou N, Anagnostopoulou D, Maragos P. Mushroom detection and three dimensional pose estimation from multi-view point clouds. Sensors, 2023; 23(7): 3576.
[4] Hua X, Li H, Zeng J, Han C, Chen T, Tang L, et al. A review of target recognition technology for fruit picking robots: from digital image processing to deep learning. Applied Sciences, 2023; 13(7): 4160.
[5] Qi X, Dong J, Lan Y, Zhu H. Method for identifying litchi picking position based on YOLOv5 and PSPNet. Remote Sensing, 2022; 14(9): 2004.
[6] Dean Z, Liu X Y, Chen Y, Jin J, Jia W K, Hu C L. Image recognition at night for apple picking robot. Transactions of the CSAM, 2015; 46(3): 15–22.
[7] Xu C, Lu Y, Jiang H, Liu S, Ma Y, Zhao T. Counting crowded soybean pods based on deformable attention recursive feature pyramid. Agronomy, 2023; 13(6): 1507.
[8] Yang C H, Xiong L Y, Wang Z, Wang Y, Shi G, Kuremot T, et al. Integrated detection of citrus fruits and branches using a convolutional neural network. Comput Electron in Agric, 2020; 174: 105469.
[9] Chen W, Lu S, Liu B, Li G, Qian T. Detecting citrus in orchard environment by using improved YOLOv4. Scientific Programming. 2020; 2020: 1–3.
[10] Chen P, Li W, Yao S, Ma C, Zhang J, Wang B, et al. Recognition and counting of wheat mites in wheat fields by a three-step deep learning method. Neurocomputing, 2021; 437: 21–30.
[11] Li R, Wang R J, Zhang J, Xie C J, Liu L, Wang F Y, et al. An effective data augmentation strategy for CNN-based pest localization and recognition in the field. IEEE Access, 2019; 7: 160274–160283.
[12] Liu T, Chen W, Wu W, Sun C M, Guo W S, Zhu X K. Detection of aphids in wheat fields using a computer vision technique. Biosystems Engineering, 2016; 141: 82–93.
[13] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 2017; pp.2961-2969.
[14] Huang Z J, Huang L C, Gong Y C, Huang C, Wang X G. Mask scoring R-CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019; pp.6409-6418.
[15] Sun C Z, Hu X M, Yu T. Structural design of agaricus bisporus picking robot based on cartesian coordinate system. Electrical Engineering and Computer Science (EECS), 2019; 2: 103–106.
[16] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W Y, Dollár P. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp.4015–4026.
[17] Cai Z Y, Jian Y, Zhang Z Y, Jin C Q, Da F P. SST-ReversibleNet: Reversible-prior-based spectral-spatial transformer for efficient hyperspectral image reconstruction. Arxiv preprint, 2023; arxiv: 2305.04054.
[18] Cai Z Y, Li C Y, Yu Y, Jin C Q, Da F P. Momentum accelerated unfolding network with spectral-spatial prior for computational spectral imaging. Applied Soft Computing, 2024; Feb 21: 111420.
[19] Chen K, Pang J M, Wang J Q, Xiong Y, Li X X, Sun S Y, et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019; pp.4974–4983.
[20] Yang S Z, Huang J, Yu X Y, Yu T. Research on a segmentation and location algorithm based on mask RCNN for agaricus bisporus. In 2022 2nd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), IEEE, 2022; pp.717–721.
[21] Cong P C, Feng H, Lv K F, Zhou J C, Li S D. MYOLO: a lightweight fresh shiitake mushroom detection model based on YOLOv3. Agriculture, 2023; 13(2): 392.
[22] Hafiz A M, Bhat G M. A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval, 2020; 9(3): 171–89.
[23] Romera-Paredes B, Torr P H. Recurrent instance segmentation. In Proceedings of 14th European Conference on Computer Vision–ECCV 2016, Amsterdam, The Netherlands, 2016; pp.312–329.
[24] Arnab A, Torr PH. Pixelwise instance segmentation with a dynamically instantiated network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017; pp.441–450.
[25] Lee Y, Park J. Centermask: Real-time anchor-free instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.13906–13915.
[26] Cai Z W, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019; 43(5): 1483–1498.
[27] Bolya D, Zhou C, Xiao F, Lee Y J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019; pp.9157–9166.
[28] Chen H, Sun K Y, Tian Z, Shen C H, Huang Y M, Yan Y L. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.8573–8581.
[29] Ying H, Huang Z, Liu S, Shao T J, Zhou K. Embedmask: Embedding coupling for one-stage instance segmentation. Arxiv preprint, 2019; arxiv: 1912.01954.
[30] Wang X L, Zhang R F, Kong T, Li L, Shen C H. Solov2: Dynamic and fast instance segmentation. Advances in Neural information Processing Systems, 2020; 33: 17721–17732.
[31] Shojaiee F, Baleghi Y. EFASPP U-Net for semantic segmentation of night traffic scenes using fusion of visible and thermal images. Engineering Applications of Artificial Intelligence, 2023; 117: 105627.
[32] Kaur A, Goyal P, Rajhans R, Agarwal L, Goyal N. Fusion of multivariate time series meteorological and static soil data for multistage crop yield prediction using multi-head self-attention network. Expert Systems with Applications, 2023; 226: 120098.
[33] Yang Q L, Ye Y, Gu L C, Wu Y T. MSFCA-net: A multi-scale feature convolutional attention network for segmenting crops and weeds in the field. Agriculture, 2023; 13(6): 1176.
[34] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017; 30: 1–11.
[35] Gillioz A, Casas J, Mugellini E, Abou Khaled O. Overview of the Transformer-based Models for NLP Tasks. In 15th Conference on Computer Science and Information Systems (FedCSIS), IEEE, 2020; pp.179–183.
[36] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arxiv preprint arxiv: 2010.11929. 2020 Oct 22.
[37] Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp.10012–10022.
[38] Bao W X, Xie W J, Hu G S, Yang X J, Su B B. Wheat ear counting method in UAV images based on TPH-YOLO. Transactions of the CSAE, 2023; 39(1): 155–161. (in Chinese)
[39] Xu Y L, Kong S L, Chen Q Y, Gao Z Y, Li C X. Model for identifying strong generalization apple leaf disease using transformer. Transactions of the CSAE, 2022; 38(16): 198–206. (in Chinese)
[40] Wang C, Wu X H, Zhang Y Q, Wang W J. Recognizing weeds in maize fields using shifted window Transformer network. Transactions of the CSAE, 2022; 38(15): 133–42. (in Chinese)
[41] Fu L L, Huang H, Wang H, Huang S C, Chen D. Classification of maize growth stages using the Swin transformer model. Transactions of the CSAE, 2022; 38(14): 191–200.
[42] Zhu D L, Yu M S, Liang M F. Real-time instance segmentation of maize ears using SwinT-YOLACT. Transactions of the CSAE, 2023; 39(14): 164–172. (in Chinese)
[43] Liu X, Yi S, Li L, Cheng X H, Wang C. Semantic segmentation of terrace image regions based on lightweight CNN-transformer hybrid networks. Transactions of the CSAE, 2023; 39(13): 171–181. (in Chinese)
[44] Fang Y X, Yang S S, Wang X G, Li Y, Fang C, Shan Y, et al. Instances as queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp.6910–6919.
[45] Kirillov A, Wu Y, He K, Girshick R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.9799–9808.
[46] Cai Z Y, Jin C, Da F. DMDC: Dynamic-mask-based dual camera design for snapshot Hyperspectral Imaging. arxiv preprint, 2023; arxiv: 2308.01541.
Downloads
Published
2024-09-06
How to Cite
Wang, J., Song, W., Zheng, W., Feng, Q., Wang, M., & Zhao, C. (2024). Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation. International Journal of Agricultural and Biological Engineering, 17(4), 227–235. Retrieved from https://ijabe.migration.pkpps03.publicknowledgeproject.org/index.php/ijabe/article/view/8987
Issue
Section
Information Technology, Sensors and Control Systems
License
IJABE is an international peer reviewed open access journal, adopting Creative Commons Copyright Notices as follows.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).