YAN Qiuyu, WANG Shu, HUA Yixin, ZHANG Jiangshui
[Objectives] Fine-grained object recognition in remote sensing is a fundamental yet highly challenging task within both Earth observation and computer vision. It involves the accurate localization and detailed classification of objects in High-Spatial-Resolution (HSR) imagery, which often features highly complex backgrounds, inter-class similarities, and intra-class variations. In recent years, notable progress has been driven by algorithms that jointly exploit pixel-level, object-level, and neighborhood-level information. These approaches combine semantic features, texture characteristics, and spatial contextual relationships to form multi-source and multi-scale feature representations. Despite these advances, existing methods remain inadequate for directly utilizing higher-level fine-grained knowledge such as scene composition, entity semantics, attribute descriptions, and temporal dynamics. The core limitation lies in the absence of a formalized knowledge organization and representation paradigm capable of systematically bridging low-level visual perception and higher-order semantic reasoning. [Methods] To address these limitations, this study proposes a multi-level knowledge graph-based organization and representation framework specifically designed for fine-grained remote sensing object recognition. The framework adopts a four-layer hierarchical structure encompassing scene, entity, feature, and change dimensions, enabling dynamic and semantically rich descriptions of remote sensing targets. In this structure, scene nodes provide contextual constraints, entity nodes capture essential connotations of objects, feature nodes encode visual and semantic attributes, and change nodes represent temporal evolution. [Results] By incorporating spatiotemporal references, spatial morphology, and inter-object relationships, the proposed approach enables knowledge organization under multiple constraints, including scene, entity, feature, and temporal conditions. In doing so, it moves beyond purely data-driven perception and establishes a mechanism for knowledge-driven reasoning in remote sensing interpretation. Extensive experiments were conducted to validate the effectiveness of the proposed framework. When integrated into the baseline model STD, the knowledge graph yielded an improvement of approximately 3.82% in mean Average Precision (mAP) and 3.92% in recall, demonstrating its ability to enhance detection accuracy. Beyond this single case, the universality and robustness of the framework were confirmed by consistent performance improvements across several representative neural networks, including Oriented R-CNN, Oriented RepPoints, LSKNet, and STD. These results indicate that the proposed method not only improves recognition performance but also enhances interpretability and adaptability across heterogeneous architectures and datasets. [Conclusions] Overall, this study demonstrates that a multi-level knowledge graph provides an effective pathway for advancing fine-grained object recognition in remote sensing, transitioning from feature perception to knowledge reasoning. The method not only increases recognition accuracy but also enhances semantic interpretability and dynamic adaptability, offering a scalable solution for intelligent remote sensing analysis. Importantly, it provides new theoretical and practical insights for applications in geospatial information extraction, environmental and urban monitoring, disaster assessment, and military intelligence analysis. By systematically integrating structured knowledge with data-driven models, the proposed framework enriches the semantic depth of remote sensing interpretation and demonstrates strong potential for future developments in intelligent Earth observation systems.