Chlorophyll-a Concentration Inversion Model: Stacked Auto-encoder Particle Swarm Optimization BP Neural Network

HAN Baohui; ZHAO Qichao; CHANG Rong; LI Xiaomeng; YAN Keqin; FU Qiming

doi:10.12082/dqxxkx.2023.230144

Journal of Geo-information Science >

2023 , Vol. 25 >Issue 9: 1882 - 1893

DOI: https://doi.org/10.12082/dqxxkx.2023.230144

Chlorophyll-a Concentration Inversion Model: Stacked Auto-encoder Particle Swarm Optimization BP Neural Network

HAN Baohui ^,¹^,² ,
ZHAO Qichao ^,¹^,²^,^* ,
CHANG Rong ¹^,³ ,
LI Xiaomeng ¹^,² ,
YAN Keqin ¹^,² ,
FU Qiming ¹^,²

Expand

1. North China Institute of Aerospace Technology, Langfang 065000, China
2. Hebei Collaborative Innovation Center of Aerospace Remote Sensing Information Processing and Application, Langfang 065000, China
3. Bureau of Ecological Environment, Xiongan New Area Administrative Committee, Hebei Province, Baoding 071700, China

* ZHAO Qichao, E-mail: theoddone1987@163.com

Received date: 2023-03-23

Revised date: 2023-06-05

Online published: 2023-09-05

Supported by

High Resolution Earth Observation System is a Major National Science and Technology Project(67-Y50G04-9001-22/23)

High Resolution Earth Observation System is a Major National Science and Technology Project(67-Y50G05-9001-22/23)

Science and Technology Research Project of Education Department of Hebei Province(CXY2023011)

Science and Technology Research Project of Education Department of Hebei Province(QN2022076)

Fold

Abstract

The concentration of Chlorophyll-a(Chl-a) has been the main indicator of eutrophication of inland waters and one of the important factors affecting the spectral characteristics of the reflectance of water. Monitoring the concentration of Chl-a in inland water bodies can provide valuable information for managing and mitigating the effects of eutrophication. In this study, hyperspectral data and water samples were collected from Baiyangdian Lake and villages in Baotou County, and water quality parameters such as Chl-a were determined in the laboratory, which were applied to Chl-a hyperspectral remote sensing inversion in Baiyangdian region. The stacked auto-encoder particle swarm optimization BP neural network model, the BP neural network model of hyperspectral data without dimensionality reduction, the BP neural network model of dimensionality reduction based on principal component analysis, and the BP neural network model of dimensionality reduction based on stepwise regression analysis were respectively established. To solve the problems of insufficient feature extraction ability of linear dimension reduction method and low learning efficiency and poor generalization ability of Chl-a hyperspectral remote sensing inversion model constructed by neural network, an inversion model of Chl-a concentration was proposed based on stacked auto-encoder particle swarm optimization BP neural network. This model used the powerful nonlinear transformation ability of stacked auto-encoder to learn the features of hyperspectral data by minimizing the reconstruction error. It achieved the dimensionality reduction of data while preserving the radiation information of the original spectral data to the greatest extent, and extracted the depth features of the measured water spectrum. The initial weight of BP neural network was taken as the position vector of the particle. Particle swarm optimization algorithm was used to search for the optimal initial weight of the network, reduce the probability of local extreme value, and improve the stability of the model and the accuracy of inversion. Compared to the BP neural network model without dimensionality reduction of hyperspectral data (R²=0.75, RMSE=3.16 μg/L, MAE=2.39 μg/L), the BP neural network model based on principal component analysis for dimensionality reduction (R²=0.79, RMSE=2.85 μg/L, MAE=2.29 μg/L), and the BP neural network model based on stepwise regression analysis for dimensionality reduction (R²=0.80, RMSE=2.79 μg/L, MAE=2.38 μg/L), the stacked auto-encoder particle swarm optimization BP neural network model (R²=0.82, RMSE=2.65 μg/L, MAE=1.89 μg/L) had higher accuracy in hyperspectral remote sensing inversion of Chl-a in inland water bodies. This study provides a theoretical basis and technical support for hyperspectral remote sensing inversion of Chl-a in inland Class II water bodies, helps with continuous monitoring of water quality in Baiyangdian Lake, and provides new ideas for future hyperspectral satellite remote sensing image inversion of Chl-a.

Key words： measured spectrum; stacked auto-encoder l; particle swarm optimization algorithm; BP neural network; water quality detection; chlorophyll-a retrieval; data dimension reduction; feature extraction

Cite this article

HAN Baohui , ZHAO Qichao , CHANG Rong , LI Xiaomeng , YAN Keqin , FU Qiming . Chlorophyll-a Concentration Inversion Model: Stacked Auto-encoder Particle Swarm Optimization BP Neural Network[J]. Journal of Geo-information Science, 2023 , 25(9) : 1882 -1893 . DOI: 10.12082/dqxxkx.2023.230144

1 引言

叶绿素a（Chl-a）可以有效反映水体富营养化程度^[1-2]，是估算水中藻类生物量、评价水体初级生产力的重要指标^[3⇓⇓-6]。内陆水体因悬浮物(Suspended Solids，SS)、有色可溶性有机物(Colored Dissolved Organic Matter，CDOM)等组分^[7]，导致表观和固有光学特性较远洋水体更为复杂，其Chl-a与反射光谱呈现复杂的非线性关系。因此，内陆水体的 Chl-a遥感反演一直是该领域的热点问题。

高光谱遥感数据谱段丰富，能够更全面地刻画内陆水体光学特性，有利于提高内陆水体Chl-a遥感反演精度。以神经网络为代表的机器学习算法可以有效逼近非线性模型^[8⇓⇓-11]，被证明是处理内陆水体Chl-a与反射光谱复杂非线性关系的一种有效手段，近年来得到广泛应用^{[12⇓⇓-15]}。然而，直接使用高光谱数据构建神经网络模型不仅计算成本大，并且由于高光谱数据的信息冗余，容易导致过拟合以及“维数灾难”^[16]，降低模型精度。因此，国内外学者在使用高光谱数据构建神经网络模型时，多进行降维处理，常用的有主成分分析法^[17⇓-19] （Principal Component Analysis, PCA）与逐步回归(Stepwise Regression, SR )分析法^[20⇓-22]。上述降维方法能够实现数据压缩，但应用于内陆水体Chl-a遥感反演时，其降维过程容易造成特征信息损失，不利于神经网络模型构建。PCA通过协方差矩阵进行特征分析，所保留的是包含最大方差的特征^[23]，无法保证降维结果对Chl-a具有更好的预测性； SR通过逐波段分析高光谱数据和因变量的线性关系，据此选取特征波段以减少冗余信息，但由于Chl-a与反射光谱之间是一个复杂的非线性关系， SR的降维结果容易丢失对非线性模型敏感的特征信息。因此，针对内陆水体Chl-a遥感反演，需要探索一种最大程度保留高光谱特征信息的降维方法，以满足构建神经网络模型的需要。

综上，本文提出一种堆栈自编码器粒子群优化BP神经网络(SAE-PSO-BP)的Chl-a遥感反演模型。该模型利用堆栈自编码器(Stacked Auto-encoder, SAE)强大的非线性变换能力，通过最小化重构误差来学习高光谱数据特征，在实现数据降维的同时最大程度保留原始光谱数据中的水体辐射信息；针对因输入参数过多所造成的BP神经网络模型学习效率低、容易陷入局部最优解等问题，使用粒子群优化算法(Particle Swarm Optimization，PSO)对BP神经网络进行优化，最终建立模型反演Chl-a。本文以雄安新区白洋淀为研究区，系统采集实测高光谱数据、水质参数数据，建立SAE-PSO-BP模型；同时对比不降维BP神经网络、基于PCA降维的BP神经网络(PCA-BP)和基于SR降维的BP神经网络(SR-BP) 3种模型，分析SAE-PSO-BP的适用性，旨在为高光谱数据反演内陆水体Chl-a提供新的思路。

2 研究区概况及数据来源

2.1 研究区概况

白洋淀（115°45′E-116°07′E， 38°44′N-38°59′N）地处河北省中部雄安新区腹地，属海河流域大清河南支水系湖泊^[24]，水域总面积366 km²，平均年份蓄水量13.2亿m³。作为华北平原上最大的天然大型平原洼淀，被誉为“华北之肾”。近年来，白洋淀环境综合治理取得明显进展，水环境承载力得到显著提升，但仍存在一定程度的水体富营养化、农村水体面源污染等问题，仍需加强监测。在白洋淀中，烧车淀是主要景区，圈头乡各村庄地处白洋淀中心，两地人口密集度高，生产生活活动剧烈，是具有代表性的内陆水体区域。因此，本文选取烧车淀、圈头乡各村庄等水域为研究对象，水体采样点分布如图1所示。

显示原图|下载原图ZIP|生成PPT

图1 水体采样点分布

Fig. 1 Distribution map of water sampling points

2.2 数据获取

白洋淀实测水体光谱数据20组，同步采集水样20个，采集时间为2021年9月22日，用GPS记录采样坐标经纬度。水体光谱采集参考水面以上测量法^[25]，使用ASD Hand-Held2手持式地物光谱仪，采样时段为北京时间10：00—14：00，采样时天气晴朗，水面平静；同步在光谱测量点位水面下20~30 cm处采集水样，使用实验室L5S紫外可见分光光度计测定样本Chl-a^[26]，每个点位对应Chl-a如表1所示，其中Chl-a范围4~27

μ g / L

，平均为12.45

μ g / L

。

表1 水体采样点统计

Tab.1 Statistical table of water body sampling points

点位	周边村名	经度/ E	纬度/ N	Chl-a(μg/L)
1	寨南村	115°59′17.9*″	38°54′3.0*″	16
2	泥李庄村	115°58′59.3*″	38°54′29.0*″	20
3	噶子村	115°58′47.7*″	38°54′42.2*″	21
4	噶子村	115°58′50.7*″	38°54′47.6*″	16
5	噶子村	115°58′53.9*″	38°54′53.2*″	19
6	小张庄村	115°58′50.7*″	38°55′13.1*″	9
7	大张庄村	115°59′8.1*″	38°55′30.9*″	6
8	大张庄村	115°59′47.7*″	38°55′34.9*″	7
9	郭里口村	116°0′16.0*″	38°55′56.2*″	8
10	郭里口村	116°0′45.6*″	38°56′3.4*″	8
11	郭里口村	116°0′31.0*″	38°55′33.8*″	4
12	王家寨村	115°59′50.5*″	38°54′28.8*″	5
13	寨南村	115°59′55.2*″	38°54′18.8*″	13
14	寨南村	115°59′54.7*″	38°54′7.7*″	7
15	寨南村	115°59′50.7*″	38°53′36.8*″	11
16	东淀头村	116°0′10.9*″	38°53′12.6*″	10
17	东淀头村	116°0′6.1*″	38°53′6.5*″	12
18	东淀头村	115°59′58.0*″	38°52′51.2*″	15
19	东淀头村	116°0′1.1*″	38°52′48.1*″	27
20	东淀头村	116°0′15.3*″	38°52′51.9*″	15

注：表中用*代替详细的经纬度信息。

3 SAE-PSO-BP模型构建

3.1 研究方法

本研究的内陆水体Chl-a遥感反演主要包含2个过程，即高光谱数据的降维和Chl-a数据的预测（图2）。首先，将参与Chl-a遥感反演的高光谱数据和水质参数数据做归一化处理，采用SAE网络的方法对实测光谱数据进行深度特征提取，实现高光谱数据的降维，获取参与遥感反演的特征光谱；然后，利用BP神经网络强大的自学习和自适应能力，映射出光谱特征和Chl-a之间的非线性关系。并通过粒子群算法优化BP神经网络初始权重和阈值，提高模型反演的稳定性，实现对Chl-a高精度的遥感反演。

显示原图|下载原图ZIP|生成PPT

图2 堆栈自编码器粒子群优化BP神经网络Chl-a反演流程

Fig. 2 Stacked auto-encoder particle swarm optimization BP neural network Chl-a inversion flow chart

3.2 光谱分析

对本次野外实验实测光谱数据使用ViewSpecPro软件做数据预处理，去除光谱曲线异常值后再做均值计算^[27]，结果如图3所示，所有光谱曲线波峰与波谷明显，在400~500 nm水体遥感反射率较低，这是由于Chl-a的吸收作用^[28]；随着波长增大，反射率逐步增加，到580 nm附近呈现出第一个明显峰值，这是水体中藻类细胞的Chl-a和胡萝卜素的弱吸收以及细胞的散射作用共同导致的；在630 nm形成了第一个波谷，这是由于藻蓝素的吸收作用，在670 nm又形成一个波谷，这是藻类主导的Chl-a强吸收作用；690 nm出现了显著的波峰，这是Chl-a的荧光作用形成的荧光峰 ^[29]；从725 nm之后，反射率迅速减小，在810 ~820 nm出现一个小的波峰，这是水体中悬浮物后向散射和纯水吸收谷的共同影响。可以看出，研究区水体光谱显著呈现出内陆水体特征，内陆水体中的CDOM会改变水体的光学特性^[30]，CDOM对近紫外区波段的强吸收作用可延伸到可见光波段，并随波长增加逐渐减弱^[31]，为减少白洋淀水体中CDOM对Chl-a遥感反演的干扰，不使用550 nm之前的波段，900 nm之后的波段Chl-a与光谱无明显的相关特征，故选出550~900 nm的实测光谱数据作为SAE-PSO-BP模型的输入变量。

显示原图|下载原图ZIP|生成PPT

图3 实测水体光谱曲线

Fig. 3 Measured spectral curve of water body

3.3 SAE高光谱数据降维

自动编码器（Auto-Encoder ，AE）算法是利用人工神经网络实现的无监督学习方法^[32]，由编码器和解码器两部分构成。图4为AE的结构图。规定输入归一化后的光谱数据为

x = {x 1, x 2, ⋯, x n}

,中间隐含层输出的特征光谱为

h = {h 1, h 2, ⋯, h m}

,自编码器输出归一化后的光谱数据为

y = {y 1, y 2, ⋯, y n}

。在编码和解码的过程中，归一化后的光谱数据分别经过下式变换：

（1）

h = f (W Y x + b y)

（2）

y = g (W z h + b z)

式中：

f ∙ 、 g ∙

为激活函数，本模型选择sigmoid函数；

W y

和

b y

是编码过程中神经元的权重和阈值，

W Z

和

b Z

是解码过程中神经元的权重和阈值。

显示原图|下载原图ZIP|生成PPT

图4 自编码器网络结构

注:x₁-x_n 与y₁-y_n,均为归一化后的光谱数据;h₁-h_n为特征光谱。

Fig. 4 Auto-encoder network structure

AE网络的训练可以概括为通过反向传播算法（Back Propagation）在不断调整参数

{W y, W Z}

与

{b y, b Z}

使光谱数据重构误差最小化的过程。本文规定重构误差为均方误差（Mean-Square Error, MSE）。

SAE是由多个AE逐层堆叠而成的无监督深度学习网络^[33]（图5），可以提取高光谱数据的深度特征并降维。SAE神经网络的编码器相邻层中的神经元个数逐层降低，解码器相邻层中的神经元个数逐层增高，在编码器相邻层中，前一层的输出作为后一层的输入，直至训练完最后一个隐含层，该隐含层的输出结果就是利用SAE提取的特征光谱。SAE通过这种逐层训练的方式可以大幅度减少实测光谱数据的维度，剔除高光谱数据中的冗余，减少后续神经网络建模时过拟合现象出现的概率。

显示原图|下载原图ZIP|生成PPT

图5 堆栈自编码器网络结构

Fig. 5 Stacked Auto-encoder network structure

为了提升SAE网络训练速率，还需对高光谱数据进行归一化处理。本文采用式(3)的最大最小标准化法进行归一化。

（3）

x ˆ = x - x m i n x m a x - x m i n

式中：x为原始的实测光谱数据；

x ˆ

为标准化处理后的实测光谱数据；

x m a x

和

x m i n

分别为该组实测光谱数据中的最大值和最小值。经过标准化处理后，令550~900 nm的实测光谱数据与Chl-a均处于[0,1]内。将标准化处理后的数据随机选出13组训练集用作模型的训练，7组验证集用作模型的验证。首先，通过SAE网络对模型输入的高光谱数据降维，将提取到的15条特征光谱数据用于PSO优化的BP神经网络反演Chl-a。

目前SAE结构的选取尚无完善的理论依据，因此根据数据与应用的方式不同调整参数的设置，确定SAE网络的拓扑结构对于最终的预测效果十分重要。网络层数过多会使得模型性能下降，导致过拟合，而层数过少会使得模型与真实函数的拟合性不佳，导致无法学习到更高阶更抽象的特征，隐藏层中的节点数较少会导致学习的不充分，而过多的节点会导致网络负荷过大，影响训练的时间；本研究通过多次实验最终确定SAE网络结构为351-234-156-102-68-34-15，通过50次训练有效实现高光谱数据的降维与特征提取。

3.4 PSO算法描述

PSO算法是根据鸟群捕食规律设计出的一种群体智能优化算法^[34]。该算法假定在开始时随机初始化出m个n维粒子，由m个粒子组成的粒子群在第i个粒子的空间位置可以用一个n维空间向量

x i = [x i 1, x i 2, ⋯, x i m] T

表示，第i个粒子的飞行速度向量为

v i = [v i 1, v i 2, ⋯, v i n] T

；通过每次迭代,根据适应度函数可以计算出每个粒子的个体极值和全局极值，分别用

p i = [p i 1, p i 2, ⋯, p i n] T

和

p g = [p g 1, p g 2, ⋯, p g n] T

表示；在每一次迭代中，通过式(4)更新粒子自身的速度，根据式(5)更新自身的位置。

（4）

V i m k + 1 = ω V i m k + c 1 r a n d 1 (P i m k - X i m k) + c 2 r a n d 2 (P g m k - X i d k)

（5）

X i m k + 1 = X i m k + V i m k + 1

式中：

V i m k + 1

为粒子

i

第

k + 1

次迭代的飞行速度；

V i m k

为粒子

i

第

k

次迭代的速度；惯性权重为

ω

；rand₁和 rand₂取[0,1]之间的随机数；

k

为当前的迭代次数；

c 1

、

c 2

是学习因子。

PSO优化的BP神经网络能够克服使用BP神经网络反演Chl-a时收敛速度慢、容易陷入局部极小值的缺点。BP神经网络经过PSO优化算法，以损失函数最小化为适应度函数，迭代寻找出BP神经网络的最佳初始权重和阈值赋予BP神经网络协助Chl-a遥感反演。

BP神经网络学习速率大小决定权重的更新速率，学习速率过大时表现为损失函数曲线震荡，而学习速率过小时，会导致损失函数收敛的速率过慢；迭代次数是对模型优化的次数，当训练次数达到最大值或者误差达到预设值时训练结束。本次实验采用3层网络结构，隐含层神经元个数通过式(6)和式(7)进行多次实验进行确定^[35-36]。PSO优化的BP神经网络结构为15-14-1的

R 2

达到0.82(训练集)和0.81(验证集)，模型的准确率最高。本研究通过多次实验确定了参数设置（表2）。

（6）

p = v + q + a

（7）

m = 2 n + 1

式中：p和m是隐含层神经元的个数；v和n是输入层神经元的个数；q是输出层神经元的个数；

a

为 1~10的任意常数。

表2 SAE-PSO-BP网络预测模型参数

Tab. 2 Parameters of SAE-PSO-BP network prediction model

参数	值
PSO学习因子 $c 1$	1.50
PSO学习因子 $c 2$	1.50
PSO初始种群数	400
PSO最大迭代次数	10
PSO初始粒子随机速度	（-1,1）
PSO初始粒子随机位置	（0,0.1）
PSO的惯性因子w	1.1
PSO-BP训练次数	5 000
PSO-BP学习率	0.02

4 精度评价

4.1 SAE-PSO-BP模型反演结果

本文使用决定系数(R²)，均方根误差（Root Mean Square Error, RMSE），平均绝对误差（Mean Absolute Error, MAE）3个指标对Chl-a反演模型进行精度评估。各指标计算公式如下：

（8）

R 2 = 1 - ∑ i = 1 n (y o b s, i - y ˆ m o d, i) 2 ∑ i = 1 n (y o b s, i - y - o b s) 2

（9）

R M S E = ∑ i = 1 n (y ˆ m o d, i - y o b s, i) 2 n

（10）

M A E = 1 n ∑ i = 1 n (| y ˆ m o d, i - y o b s, i |)

式中：

y o b s, i

为第i组样本Chl-a实测值；

y - o b s

为实测平均值；

y ˆ m o d, i

为模型反演值； n为样本总数。

使用SAE-PSO-BP模型进行Chl-a反演。图6和图7分别给出了SAE-PSO-BP模型反演Chl-a的结果和精度，其中训练集样本反演结果

R 2

值为0.82， RMSE为2.65 μg⁄L， MAE为1.89 μg⁄L，验证集样本反演结果精度

R 2

值为0.81， RMSE为2.25 μg⁄L，MAE为1.76 μg⁄L。通过训练集和验证集结果可以看出SAE-PSO-BP模型反演精度高，稳定性好。

显示原图|下载原图ZIP|生成PPT

图6 SAE-PSO-BP模型反演Chl-a结果

Fig. 6 Inversion of Chl-a results by SAE-PSO-BP model

显示原图|下载原图ZIP|生成PPT

图7 SAE-PSO-BP模型反演结果精度

Fig. 7 Accuracy of SAE-PSO-BP model inversion results

4.2 其它模型反演结果

为进一步对比分析SAE-PSO-BP模型反演Chl-a的性能，分别建立以下3种反演模型：

（1）不降维BP神经网络模型。对550~900 nm的光谱数据作为模型输入，建立BP神经网络反演模型。

（2） PCA-BP模型。对550~900 nm的光谱数据使用PCA方法降维，将前5个主成分（累积贡献率为98.06%）作为模型输入，建立PCA-BP反演模型。

（3） SR-BP模型。对550~900 nm的光谱数据使用SR方法降维，在回归方程中逐步剔除对Chl-a影响不显著的变量，将最终得到的19个特征波长735、871、683、711、779、697、893、726、839、879、777、900、757、854、783、801、810、620、682 nm作为模型输入，建立SR-BP反演模型。

不降维BP神经网络模型、PCA-BP模型和SR-BP模型均采用3层网络结构。其中输入层神经元个数为特征波长数；输出层的1个神经元代表Chl-a；隐含层神经元个数通过式(8)和式(9)经多次实验后确定，分别为 65、11和39个。图8、图9和图10分别是不降维BP神经网络模型、PCA-BP模型和SR-BP模型对训练集样本反演结果的折线图。图11、图12和图13分别是3种模型验证集样本反演结果精度。

显示原图|下载原图ZIP|生成PPT

图8 不降维BP神经网络模型反演Chl-a结果

Fig. 8 Inversion of Chl-a results by BP neural

显示原图|下载原图ZIP|生成PPT

图9 PCA-BP模型反演Chl-a结果

Fig. 9 Inversion of Chl-a results by PCA-BP modelnetwork model without reduced dimension

显示原图|下载原图ZIP|生成PPT

图10 SR-BP模型反演Chl-a结果

Fig. 10 SR-BP model inversion of Chl-a results

显示原图|下载原图ZIP|生成PPT

图11 不降维BP神经网络模型反演结果精度

Fig. 11 Inversion accuracy of BP neural network model

显示原图|下载原图ZIP|生成PPT

图12 PCA-BP模型反演结果精度

Fig. 12 Accuracy of PCA-BP model inversion resultswithout reducing dimension

显示原图|下载原图ZIP|生成PPT

图13 SR-BP模型反演结果精度

Fig. 13 Accuracy of SR-BP model inversion results

4.3 精度评价

PCA-BP模型、SR-BP模型与SAE-PSO-BP模型3种包含降维方法的神经网络模型R²要优于不降维BP神经网络模型，说明原始实测光谱数据波段数据多、冗余性强的特点会影响Chl-a反演模型的建立。将3种模型方法的精度评价汇总为表3所示，对于同样的训练集，在3种包含降维的神经网络模型中SAE-PSO-BP模型的RMSE和MAE最小，分别为2.65

μ g / L

和1.89

μ g / L

，由此证明SAE-PSO-BP模型能更好的建立出Chl-a遥感反演模型。

表3 4种模型的精度验证统计

Tab. 3 Accuracy verification statistics table of the four models

	方法模型	R²	RMSE/(μg/L)	MAE/(μg/L)
训练集	不降维BP神经网络模型	0.75	3.16	2.39
	PCA-BP模型	0.79	2.85	2.29
	SR-BP模型	0.80	2.79	2.38
	SAE-PSO-BP模型	0.82	2.65	1.89
验证集	不降维BP神经网络模型	0.73	2.75	2.56
	PCA-BP模型	0.77	2.56	2.07
	SR-BP模型	0.77	2.53	2.07
	SAE-PSO-BP模型	0.81	2.25	1.76

对于验证集同样出现了上述规律，从3种指标来看，SAE-PSO-BP模型训练集(R²=0.82，RMSE=2.65 μg/L，MAE=1.89 μg/L)和验证集(R²=0.81，RMSE=2.25 μg/L，MAE=1.76 μg/L)差异最小，稳定性最高。

5 讨论

不降维BP神经网络模型在4种模型中反演精度最低，这主要是由于本次实验使用的是实测高光谱数据，参与建模反演Chl-a的波段为550~900 nm，共计351个，这些高光谱数据中存在大量的冗余，导致计算精度低。对于本文其它3种包含降维处理的模型，PCA-BP模型反演Chl-a时，通过PCA强大的数据压缩能力将实测高光谱数据降维为5个主成分，协同BP神经网络提高了反演精度；SR-BP模型反演Chl-a时，通过SR分析法逐一引入高光谱变量，剔除对Chl-a影响不显著的变量，保留对Chl-a影响显著的变量，从而筛选出最优光谱变量集，实现降维，提高BP神经网络反演精度。SAE-PSO-BP模型利用SAE网络以无监督方法将高光谱数据重建为15个新特征，搭配PSO优化的BP神经网络提高反演精度。虽然高光谱能够更加精细地体现光谱的变化，囊括更多的水体辐射信息，但在使用BP神经网络反演Chl-a时，仍需要采取合适的手段进行降维并提取关键信息。

3种对高光谱数据采取先降维后反演的模型中，SAE-PSO-BP模型反演Chl-a精度最高，稳定性最好。PCA-BP模型反演Chl-a时，由于水体高光谱数据在获取时受到水体组分、传感器角度等影响，容易使光谱信息的分布出现较大的类内方差，使压缩的数据中光谱特征保留较少，从而影响BP神经网络反演Chl-a的精度；SR-BP模型反演Chl-a时，使用SR分析法，通过光谱变量在线性模型中表现的优劣来选择波段，容易剔除部分对非线性模型敏感的波段，且于相较于PCA和SAE，数据压缩能力较弱。而SAE是通过最小化重构误差来学习高光谱数据特征，在实现数据非线性降维的同时最大程度保留原始光谱数据中的水体辐射信息，因此在本文实验中精度最高。

在本文实验中SAE-PSO-BP模型取得了较好的实验效果，但该模型的应用仍存在若干问题，需要进一步研究。首先，SAE 输出的特征光谱数据会直接影响最终模型的反演精度，SAE网络拓扑结构是本模型的关键参数，然而，目前仍缺少对SAE隐含层神经元进行定义的理论方法。另一方面， SAE网络在处理归一化后的实测光谱数据时，可能会因为训练次数设置不当，导致网络过拟合，为抑制此现象的发生，可以尝试添加一些针对实测光谱数据的噪声，增强SAE网络的泛化能力，提高神经网络的鲁棒性，但具体方法还有待研究。

6 结论

本文以白洋淀烧车淀、圈头乡各村庄水域为研究区，系统采集了实测高光谱数据、水质参数数据，针对高光谱数据和神经网络算法特点，提出SAE-PSO-BP模型实现Chl-a遥感反演。该模型有效结合了SAE网络结构灵活可调、训练难度小、特征提取能力强的特点，通过非线性降维较好地保留了高光谱数据特征，提升了神经网络模型的性能和精度，该模型可为Chl-a遥感反演提供新的思路。主要研究结论如下：

（1）本文提出了一种适用于Chl-a高光谱遥感反演的SAE-PSO-BP模型，一定程度上解决了神经网络在处理非线性问题时缺少适配的降维方法的问题，该模型在实现高光谱数据降维的同时，可以有效提取出高光谱数据的深度特征，通过PSO算法搜寻网络初始权重的最优值，降低出现局部极值的概率，提高BP神经网络反演Chl-a的稳定性和精确度。SAE-PSO-BP模型的

R 2

为0.82，RMSE为2.65 μg⁄L，MAE为1.89 μg⁄L。证明SAE-PSO-BP模型在内陆二类水体中可以有效地提取实测光谱数据的深度特征，拟合出复杂的函数关系，有效提高了内陆二类水体Chl-a遥感反演精度，也为以后高光谱卫星遥感影像反演Chl-a提供了新的思路。

（2）通过SAE-PSO-BP模型、不降维BP神经网络模型、PCA-BP模型与SR-BP模型对白洋淀烧车淀、圈头乡各村庄等水域反演Chl-a，对4种模型的精度分析可以得出，在建立Chl-a与高光谱数据的非线性关系前，对高光谱数据进行降维是必要的。

（3）内陆水体Chl-a遥感反演，需要建立复杂的非线性模型，针对高光谱数据降维的问题，SAE网络相较于传统的线性降维方法，可以对高光谱数据实现降维并保留更多的水体辐射信息。SAE网络特征提取能力强，更适配神经网络算法。

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Mishra S, Mishra D R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters[J]. Remote Sensing of Environment, 2012, 117:394-406. DOI:10.1016/j.rse.2011.10.016 DOI

[2]

詹志薇, 谭志, 金腊华, 等. 水源型水库的氮形态分布特征与水体富营养化的关系[J]. 安徽农业科学, 2017, 45(10):59-62.

[Dai

Q C

, Zhan

Z W

, Tan

, Jin

L H

, et al. The relationship between the distribution of nitrogen forms and eutrophication of water source reservoirs[J]. Journal of Anhui Agricultural Sciences, 2017, 45(10):59-62.] DOI:10.13989/j.cnki.0517-6611.2017.10.020

DOI

[3]

韩耀全, 黄励, 施军, 等. 常用水体初级生产力测定方法的结果差异分析[J]. 江苏农业科学, 2018, 46(1):201-206.

[Han

Y Q

, Huang

, Shi

, et al. Difference analysis of results of common methods for measuring primary productivity of water bodies[J]. Jiangsu agricultural Sciences, 2018, 46(1):201-206.] DOI:10.15889/j.issn.1002-1302.2018.01.053

DOI

[4]

金松, 韩震, 李雪娜, 等. 叶绿素浓度和海表温度与黄海绿潮海洋初级生产力关系的研究[J]. 海洋湖沼通报, 2017(2):131-138.

[Jin

, Han

, Li

X N

, et al. Preliminary studies on the relationship between chlorophyll a, sea surface temperature and primary productivity of yellow sea green tide[J]. Transactions of Oceanology and Limnology, 2017(2):131-138.] DOI:10.13984/j.cnki.cn37-1141.2017.02.018

DOI

[5]

冯世敏, 刘冬燕, 李东京, 等. 安徽太平湖水库初级生产力时空分布及分析[J]. 湖泊科学, 2016, 28(6):1361-1370.

[Feng

S M

, Liu

D Y

, Li

D J

, et al. Analysis on the temporal and spatial distribution of the primary productivity and its influencing factors in Lake Taiping( Reservoir), Anhui Province[J]. Journal of Lake Sciences, 2016, 28(6):1361-1370.] DOI:10.18307/2016.0622

DOI

[6]

郭诗君, 王小军, 韩品磊, 等. 丹江口水库叶绿素a浓度的时空特征及影响因子分析[J]. 湖泊科学, 2021, 33(2):366-376.

[Guo

S J

, Wang

X J

, Han

P L

, et al. Spatiotemporal characteristics of layered chlorophyll-a concentration and influencing factors in Danjiangkou Reservoir[J]. Journal of Lake Sciences, 2021, 33(2):366-376.] DOI:10.18307/2021.0206

DOI

[7]	李恺霖, 廖廓, 党皓飞. 内陆与近岸水体的色度学遥感研究进展[J]. 自然资源遥感, 2023, 35(1):15-26. [Li K L, Liao K, Dang H F. Recent progress in chromaticity remote sensing of inland and nearshore water bodies[J]. Remote Sensing for Natural Resources, 2023, 35(1):15-26.]

[8]	朱云芳. 基于GF-1WFV影像和BP神经网络的太湖叶绿素a反演[J]. 环境科学学报, 2017, 37(1):130-137. [Zhu Y F, Zhu L, Li J G, et al. The study of inversion of chlorophyll a in Taihu based on GF-1 WFV image and BP neural network[J]. Acta Scientiae Circumstantiae, 2017, 37(1):130-137.] DOI:10.13671/j.hjkxxb.2016.0275 DOI

[9]	Lu F, Chen Z, Liu W Q, et al. Modeling chlorophyll-a concentrations using an artificial neural network for precisely eco-restoring lake basin[J]. Ecological Engineering, 2016, 95:422-429. DOI:10.1016/j.ecoleng.2016.06.072 DOI

[10]	Li X E, Sha J A, Wang Z L. Chlorophyll-a prediction of lakes with different water quality patterns in China based on hybrid neural networks[J]. Water, 2017, 9(7):524. DOI:10.3390/w9070524 DOI

[11]

孙茜童, 付芸, 韩春晓, 等. 基于卷积神经网络的全球海洋叶绿素a浓度反演方法[J]. 光谱学与光谱分析, 2023, 43(2): 608-613.

[Sun

X T

, Fu

, Han

C X

, et al. An inversion method for chlorophyll-a concentration in global ocean through convolutional neural networks[J]. Spectroscopy and Spectral Analysis, 2023, 43(2):608-613.] DOI:10.3964/j.issn.1000-0593(2023)02-0608-06

DOI

[12]	Cao Q, Yu G L, Qiao Z Y. Application and recent progress of inland water monitoring using remote sensing techniques[J]. Environmental Monitoring and Assessment, 2022, 195(1):1-16. DOI:10.1007/s10661-022-10690-9 DOI

[13]	Wang L H, Yue X J, Wang H H, et al. Dynamic inversion of inland aquaculture water quality based on UAVs-WSN spectral analysis[J]. Remote Sensing, 2020, 12(3):402. DOI:10.3390/rs12030402 DOI

[14]	Li N, Ning Z Y, Chen M A, et al. Satellite and Machine Learning Monitoring of Optically Inactive Water Quality Variability in a Tropical River[J]. Remote Sensing, 2022, 14(21):5466. DOI:10.3390/rs14215466 DOI

[15]

曹红业, 龚涛, 袁成忠, 等. 基于RBF模型的太湖北部叶绿素a浓度定量遥感反演[J]. 环境工程学报, 2016, 10(11):6499-6504.

[Cao

H Y

, Gong

, Yuan

C Z

, et al. Quantitative retrieval of chlorophyll-a concentration in northern part of Lake Taihu based on RBF model[J]. Chinese Journal of Environmental Engineering, 2016, 10(11):6499-6504.] DOI:10.12030/j.cjee.201506134

DOI

[16]	Jia S, Tang G H, Zhu J S, et al. A novel ranking-based clustering approach for hyperspectral band selection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(1):88-102. DOI:10.1109/TGRS.2015.2450759 DOI

[17]	Farrugia J, Griffin S, Valdramidis V P, et al. Principal component analysis of hyperspectral data for early detection of mould in cheeselets[J]. Current Research in Food Science, 2021, 4:18-27. DOI:10.1016/j.crfs.2020.12.003 DOI PMID

[18]

杨承恩, 苏玲, 冯伟志, 等. 中红外光谱结合机器学习对不同产地平菇鉴别[J]. 光谱学与光谱分析, 2023, 43(2):577-582.

[Yang

C E

, Su

, Feng

W Z

, et al. Identification of pleurotus ostreatus from different producing areas based on mid-infrared spectroscopy and machine learning[J]. Spectroscopy and Spectral Analysis, 2023, 43(2):577-582.] DOI: 10.3964/j.issn.1000-0593(2023)02-0577-06

DOI

[19]

张晓东, 李立, 毛罕平, 等. 基于PCA-BP多特征融合的油菜水分胁迫无损检测[J]. 江苏大学学报(自然科学版), 2016, 37(2):174-182.

[Zhang

X D

, Li

, Mao

H P

, et al. Nondestructive testing method for rape water stress with multiple features information fusion based on PCA-BP method[J]. Journal of Jiangsu University(Natural Science Edition), 2016, 37(2):174-182.] DOI:10.3969/j.issn.1671-7775.2016.02.009

DOI

[20]

潘月, 曹宏鑫, 齐家国, 等. 基于高光谱和数据挖掘的油菜植株含水率定量监测模型[J]. 江苏农业学报, 2022, 38(6):1550-1558.

[Pan

, Cao

H X

, Qi

J G

, et al. Quantitative monitoring models of plant water content in rapeseed based on hyperspectrum and related data mining[J]. Jiangsu Journal of Agricultural Sciences, 2022, 38(6):1550-1558.] DOI:10.3969/j.issn.1000-4440.2022.06.013

DOI

[21]	孙俊, 唐凯, 毛罕平, 等. 基于MEA-BP神经网络的大米水分含量高光谱技术检测[J]. 食品科学, 2017, 38(10):272-276. DOI [Sun J, Tang K, Mao H P, et al. Hyperspectral detection of moisture content in rice based on MEA-BP neural network[J]. Food Science, 2017, 38(10): 272-276.] DOI:10.7506/spkx1002-6630-201710044 DOI

[22]

郑咏梅, 张军, 陈星旦, 等. 基于逐步回归法的近红外光谱信息提取及模型的研究[J]. 光谱学与光谱分析, 2004, 24(6):675-678.

[Zheng

Y M

, Zhang

, Chen

X D

, et al. Research on model and wavelength selection of near infrared spectral information[J]. Spectroscopy and Spectral Analysis, 2004, 24(6):675-678.] DOI:10.3321/j.issn:1000-0593.2004.06.010

DOI

[23]

Cheriyadat

, Bruce

L M

. Why principal component analysis is not an appropriate feature extraction method for hyperspectral data[C]// IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No. 03CH37477). IEEE, 2004: 3420-3422. DOI:10.1109/IGARSS.2003.1294808

DOI

[24]

赵起超, 赵姝雅, 刘剋, 等. 基于实测光谱与Landsat8影像的白洋淀COD遥感反演[J]. 现代电子技术, 2019, 42(3):56-60.

[Zhao

Q C

, Zhao

S Y

, Liu

, et al. Remote sensing inversion of COD in Baiyang Lake based on actually-measured spectra and Landsat8 image[J]. Modern Electronics Technique, 2019, 42(3):56-60.] DOI:10.16652/j.issn.1004-373x.2019.03.014

DOI

[25]	唐军武, 田国良, 汪小勇, 等. 水体光谱测量与分析Ⅰ:水面以上测量法[J]. 遥感学报, 2004, 8(1):37-44. [Tang J W, Tian G L, Wang X Y, et al. The methods of water spectra measurement and analysisⅠ: above-water method[J]. Journal of Remote Sensing, 2004, 8(1):37-44.] DOI:10.3321/j.issn:1007-4619.2004.01.006 DOI

[26]	Lichtenthaler H K, Buschmann C. Chlorophylls and carotenoids: Measurement and characterization by UV-VIS spectroscopy[J]. Current protocols in food analytical chemistry, 2001, 1(1):F4.3.1-F4.3.8. DOI:10.1002/0471142913.faf0403s01

[27]	Xu L H, Xie D T, Fan F L. Effects of pretreatment methods and bands selection on soil nutrient hyperspectral evaluation[J]. Procedia Environmental Sciences, 2011, 10:2420-2425. DOI:10.1016/j.proenv.2011.09.376 DOI

[28]

杨振, 卢小平, 武永斌, 等. 无人机高光谱遥感的水质参数反演与模型构建[J]. 测绘科学, 2020, 45(9):60-64,95.

[Yang

, Lu

X P

, Wu

Y B

, et al. Retrieval and model construction of water quality parameters for UAV hyperspectral remote sensing[J]. Science of Surveying and Mapping, 2020, 45(9):60-64,95.] DOI:10.16251/j.cnki.1009-2307.2020.09.010

DOI

[29]	YANG Y, LI Y M, Wang Q A, et al. Retrieval of chlorophyll-a concentration in the turbid and eutrophic Taihu Lake[J]. Journal of Geo-information Science, 2009, 11(5): 597-603. DOI:10.3724/sp.j.1047.2009.00597 DOI

[30]

石玉, 李元鹏, 张柳青, 等. 不同丰枯情景下长江三角洲非通江湖泊(滆湖、淀山湖和阳澄湖)有色可溶性有机物组成特征[J]. 湖泊科学, 2021, 33(1):168-180.

[Shi

, Li

Y P

, Zhang

L Q

, et al. Characterizing chromophoric dissolved organic matter in Lake Gehu, Lake Dianshan and Lake Yangcheng in different hydrological seasons[J]. Journal of Lake Sciences, 2021, 33(1):168-180.] DOI:10.18307/2021.0124

DOI

[31]

王林, 赵冬至, 杨建洪, 等. 黄海北部CDOM近紫外区吸收光谱特性研究[J]. 光谱学与光谱分析, 2010, 30(12):3379-3383.

[Wang

, Zhao

D Z

, Yang

J H

, et al. Near ultraviolet absorption spectral properties of chromophoric dissolved organic matter in the north area of Yellow Sea[J]. Spectroscopy and Spectral Analysis, 2010, 30(12):3379-3383.] DOI:10.3964/j.issn.1000-0593(2010)12-3379-05

DOI

[32]	Badino L, Canevari C, Fadiga L, et al. An auto-encoder based approach to unsupervised learning of subword units[C]// 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2014:7634-7638. DOI:10.1109/ICASSP.2014.6855085 DOI

[33]	Yim I, Shin J, Lee H, et al. Deep learning-based retrieval of cyanobacteria pigment in inland water for in-situ and airborne hyperspectral data[J]. Ecological Indicators, 2020, 110:105879. DOI:10.1016/j.ecolind.2019.105879 DOI

[34]	虞英杰, 蒋卫刚, 徐明芳. 基于PSO算法的BP神经网络对水体叶绿素a的预测[J]. 环境科学研究, 2011, 24(5):526-532. [Yu Y J, Jiang W G, Xu M F. Prediction of chlorophyll a by BP Neural Network based on PSO Algorithm[J]. Research of Environmental Sciences, 2011, 24(5):526-532.] DOI:10.13198/j.res.2011.05.54.yuyj.013 DOI

[35]

王雪莲, 宋玉芝, 孔繁璠, 等. 利用BP神经网络模型对太湖水体叶绿素a含量的估算[J]. 中国农业气象, 2016, 37(4):408-414.

[Wang

X L

, Song

Y Z

, Kong

F F

, et al. Estimation of chlorophyll-a concentration in Taihu Lake by using back propagation (BP) neural network forecast model[J]. Chinese Journal of Agrometeorology, 2016, 37(4):408-414.] DOI:10.3969/j.issn.1000-6362.2016.04.004

DOI

[36]	黄燕高. 神经网络在长螺旋钻孔压灌混凝土桩单桩极限承载力预测中的应用[D]. 武汉: 中国地质大学(武汉), 2007. [Huang Y G. Application of neural network in prediction of ultimate bearing capacity of long spiral bored concrete pile[D]. Wuhan: China University of Geosciences, 2007.]

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

1 引言

2 研究区概况及数据来源

2.1 研究区概况

图1 水体采样点分布

2.2 数据获取

表1 水体采样点统计

3 SAE-PSO-BP模型构建

3.1 研究方法

图2 堆栈自编码器粒子群优化BP神经网络Chl-a反演流程

3.2 光谱分析

图3 实测水体光谱曲线

3.3 SAE高光谱数据降维

图4 自编码器网络结构

图5 堆栈自编码器网络结构

3.4 PSO算法描述

表2 SAE-PSO-BP网络预测模型参数

4 精度评价

4.1 SAE-PSO-BP模型反演结果

图6 SAE-PSO-BP模型反演Chl-a结果

图7 SAE-PSO-BP模型反演结果精度

4.2 其它模型反演结果

图8 不降维BP神经网络模型反演Chl-a结果

图9 PCA-BP模型反演Chl-a结果

图10 SR-BP模型反演Chl-a结果

图11 不降维BP神经网络模型反演结果精度

图12 PCA-BP模型反演结果精度

图13 SR-BP模型反演结果精度

4.3 精度评价

表3 4种模型的精度验证统计

5 讨论

6 结论

References