Multimodal Deep Learning Framework for Proactive Plant Disease Diagnosis
DOI:
https://doi.org/10.63278/1419Keywords:
RGB Imaging, Hyperspectral Imaging, Thermal Imaging, Convolutional Neural Network (CNN), Vision Transformer (ViT), Real-Time Inference, Knowledge Distillation, Model Quantization.Abstract
Early diagnosis, the correct diagnosis of plant diseases is important to ensure sustainable agriculture and the minimalization of the loss of production. Traditional approaches of plant disease detection, which involve manual inspection and single modal imaging, are highly cumbersome, erroneous and lack in capturing the niche characteristics of the disease. Some recent achievements of deep learning advocate for possible automatic plant disease diagnosis; however, still most of the current models are plagued from low generalization capability, high computational cost and the issue of real time implementation. To alleviate these difficulties, this article introduces a brand-new multiple-mode deep learning framework, that combines RGB, hyperspectral and thermal imaging to take on the task of setting up precision and efficiency for plant disease detection. The described framework makes use of EfficientNet-based CNN for spatial feature extraction from RGB images, 1D-CNN for hyperspectral spectral feature learning and Vision Transformers (ViT) for learning long-range contextual dependencies. Above sensor- features are fused by Means of weighted summation methodology, dynamically adjusts contribution of per modality to Obtain endurance and accurate. To achieve real-time performance, the model is optimized via quantization, knowledge distillation and model pruning, with a substantial decrease in its computational load. The final optimal model is implemented in NVIDIA Jetson Nano to allow low-latency inference supporting high precision agriculture. The results of the experimental results show, the proposed multi-modal framework has achieved 97.8% accuracy, 96.5% precision, 95.7% recall and 96.1% score of F, all far exceed traditional deep learning models of ResNet-50, VGG-16, EfficientNet and Vision Transformers (ViT). Moreover, the framework offers inferences in 20 milliseconds, which makes it really suitable for real-time applications. Accomplishing a successful integration of multi-modal data fusion and model optimization not only increase classification performance, but also makes the solution/matter practical and deployable in real-world agricultural environment. The proposed framework provides a hopeful solution to smart farming, which provides a possibility of detecting disease early and managing effectively the crops.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Purshottam J. Assudani, V. Rama Krishna

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their published articles online (e.g., in institutional repositories or on their website, social networks like ResearchGate or Academia), as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Except where otherwise noted, the content on this site is licensed under a Creative Commons Attribution 4.0 International License.



According to the