WANG Hui, PAN Xiao, WANG Shuhai, CHEN Xiao, LI Ning, WANG Zuocheng
[Objectives] Sea Surface Temperature (SST) is a critical determinant of marine ecological balance and global climate regulation, with its long-term prediction vital for marine disaster early warning, resource development, and ecological protection. However, SST prediction faces dual challenges: SST data exhibits non-stationary fluctuations across multiple scales and is regulated by complex nonlinear interactions among multiple environmental variables. Traditional numerical models rely on complex physical equations and suffer from high computational costs, while existing deep learning models often fail to fully capture multi-scale dynamic features or neglect synergistic effects between variables, limiting long-term prediction performance in complex marine environments. This study aims to address these issues and enhance the accuracy and robustness of long-term SST prediction. [Methods] A time series prediction model named ACAFNet is proposed, integrating multi-scale temporal feature modeling and adaptive variable interaction mining. First, it dynamically selects Top-K key scales via seasonal-trend decomposition of SST time series—using Fast Fourier Transform (FFT) to extract periodic patterns and weighted average pooling to capture long-term trends—to match inherent multi-scale features. A dual-attention mechanism then captures local fine-grained fluctuations and global long-range dependencies, effectively addressing marine data non-stationarity. Second, variables are mapped from the time domain to the frequency domain via FFT to reveal hidden correlations obscured in the time domain. A learnable Mahalanobis distance quantifies variable correlations, generating a sparse mask matrix to emphasize key predictive variables and suppress noise. Finally, a fusion module integrates multi-scale features and variable dependencies via masked multi-head attention, combined with layer normalization and residual connections, for robust prediction. [Results] Comparative experiments were conducted on one private dataset (collected from anchored buoys in the coastal waters of Qinhuangdao, Bohai Sea) and three cross-latitude public buoy datasets (52 212, NTKM3, PRDA2) covering tropical to subarctic regions, against five baseline models (DLinear, Pathformer, PatchTST, Crossformer, GPT4TS) at four prediction steps (96, 168, 336, 720 steps). Results show ACAFNet outperforms Transformer-based models by an average of 3.72% (MSE), 5.03% (MAE), and 4.17% (RMSE). Notably, in 720-step long-term prediction on the private dataset, ACAFNet achieves an MSE of 0.299, MAE of 0.399, and RMSE of 0.547, outperforming all baselines. Ablation experiments further verify the effectiveness of adaptive scale selection, dual-attention, and variable correlation measurement modules in improving model performance. [Conclusions] ACAFNet effectively improves long-term SST prediction accuracy and robustness through adaptive multi-scale division, dual-attention mechanism, and frequency-domain variable measurement. It addresses core challenges of multi-scale fluctuation capture and nonlinear variable interaction mining, providing a new paradigm for marine multi-variable time series prediction. This study offers important reference value for complex marine environment forecasting and lays a foundation for future extensions to marine ecological variable prediction and multi-modal data fusion scenarios.