Javascript is required
Search
Volume 5, Issue 2, 2026

Abstract

Full Text|PDF|XML
Automated grading has become an important component of digital transformation in K-12 education, yet the structured recognition of handwritten responses on answer sheets remains a practical challenge. General-purpose vision-language models often show limited robustness when applied directly to school assessment materials, particularly in the presence of fixed answer regions, mixed Chinese-English content, and diverse handwriting styles. To address this issue, this study develops a task-oriented fine-tuning framework for automated recognition of handwritten answer sheets in K-12 educational settings. A multimodal dataset was constructed from Chinese and English answer sheets, with region-level annotations designed to support structured text extraction. Based on this dataset, the Qwen2.5-VL-7B-Instruct model was adapted through LoRA-based fine-tuning under a dual-A16 GPU environment to reduce computational cost while preserving practical deployment feasibility. An end-to-end workflow covering data preparation, model training, weight merging, and inference was then established for structured JSON output. Experimental results show that the fine-tuned model achieved stable convergence in both small-sample and medium-sample settings and improved the extraction quality of handwritten responses within predefined answer regions. The proposed framework provides a practical and reproducible solution for deploying vision-language models in school grading scenarios with limited computing resources. The study also offers an application-oriented reference for the integration of multimodal large models into educational assessment systems.

Abstract

Full Text|PDF|XML
Small object detection in aerial imagery remains challenging due to limited spatial resolution, background clutter, and severe scale variation. Existing deep learning–based detectors often suffer from weakened shallow representations and insufficient cross-scale feature interaction, leading to missed detections and unstable localization in dense scenes. This work presents Dynamic Reconstruction and Fusion Network (DRF-Net), a frequency-guided feature reconstruction framework for small object detection. Built upon a one-stage detection paradigm, the proposed method introduces three key components: a frequency-guided channel–spatial augmentation (FCSA) module to enhance fine-grained representations, a multi-frequency reconstruction block (MFRB) to restore cross-scale structural information, and a Dynamic Reconstruction Fusion Neck (DRF-Neck) to adaptively regulate multi-scale feature aggregation. By jointly modeling high- and low-frequency components and integrating saliency-aware fusion mechanisms, the framework improves the preservation of small-object contours while suppressing redundant background responses. Extensive experiments conducted on the VisDrone2019 benchmark demonstrate that DRF-Net consistently outperforms the baseline detector in terms of detection accuracy, particularly for small and densely distributed objects, while maintaining real-time inference efficiency. Ablation studies further verify the complementary contributions of the proposed modules to feature representation and fusion stability. The results indicate that frequency-guided reconstruction and dynamic fusion provide an effective learning strategy for enhancing small-object detection performance in complex visual scenes.
Open Access
Research article
A Baseline Optical Character Recognition Framework for Printed Kashmiri Nastaliq Text Using Deep Learning
sheikh amir fayaz ,
muzamil majeed khaja ,
abdul saboor bhat ,
danish mansoor ,
anu thapa ,
majid zaman
|
Available online: 04-24-2026

Abstract

Full Text|PDF|XML

Optical Character Recognition (OCR) plays a crucial role in the digitization and preservation of textual information; however, for low-resource languages such as Kashmiri, reliable OCR solutions remain largely unavailable. Kashmiri, primarily written in the Perso-Arabic (Nastaliq) script, poses significant challenges due to its cursive structure, extensive use of ligatures, complex diacritical marks, and limited availability of annotated datasets. This research aims to address these challenges by developing a functional OCR system specifically tailored for Kashmiri text. The proposed system is built using the open-source Kraken OCR engine and leverages deep learning techniques with transfer learning from a pre-trained Arabic OCR model. A synthetic dataset was generated using Unicode Kashmiri text, enriched with Kashmiri-specific diacritics and exclusive characters, and rendered into images through automated text-to-image pipelines. Extensive preprocessing, augmentation, and iterative fine-tuning were performed to improve recognition accuracy. Model performance was evaluated using standard metrics such as Character Error Rate (CER) and Word Error Rate (WER) on both seen and unseen data. Experimental results demonstrate a substantial improvement over the initial model, with character accuracy increasing from 54.91% to 79.91% and word accuracy improving from 4.65% to 44.19%. The final model shows strong recognition capability for common and Arabic script characters, while Kashmiri-specific inherited diacritics remain a challenging area. In addition, a cross-platform user interface developed using Flutter enables users to upload or capture images and obtain digitized Kashmiri text through a simple and accessible workflow. Rather than proposing a new recognition architecture, this work contributes empirical insights, reproducible methodology, and error characterization for OCR in a previously unsupported low-resource Nastaliq language. This work is positioned as a baseline OCR system for printed Kashmiri Nastaliq text at the line level and does not claim state-of-the-art performance.

- no more data -