地球信息科学学报 ›› 2022, Vol. 24 ›› Issue (6): 1130-1138.doi: 10.12082/dqxxkx.2022.210555

• 地球信息科学理论与方法 • 上一篇    下一篇

顾及视频地理映射的人群密度估计方法

孙银萍(), 张兴国(), 石新雨, 李奇泽   

  1. 信阳师范学院地理科学学院,信阳 464000
  • 收稿日期:2021-09-15 修回日期:2021-11-19 出版日期:2022-06-25 发布日期:2022-08-25
  • 通讯作者: *张兴国(1979— ),男,河南宜阳人,博士,副教授,主要从事视频GIS研究。E-mail: zhangxingguo2012@163.com
  • 作者简介:孙银萍(1998— ),女,河南开封人,硕士生,主要从事视频GIS研究。E-mail: sunyinping2016@163.com
  • 基金资助:
    国家自然科学基金项目(41401436);河南省自然科学基金项目(202300410345);信阳师范学院“南湖学者奖励计划”青年项目

Crowd Density Estimation Method Considering Video Geographic Mapping

SUN Yinping(), ZHANG Xingguo(), SHI Xinyu, LI Qize   

  1. School of Geographic Sciences, Xinyang Normal University, Xinyang 464000, China
  • Received:2021-09-15 Revised:2021-11-19 Online:2022-06-25 Published:2022-08-25
  • Supported by:
    National Natural Science Foundation of China(41401436);Natural Science Foundation of Henan Province(202300410345);Nanhu Scholars Program for Young Scholars of XYNU

摘要:

针对复杂场景下人群计数及地图可视化问题,提出了一种顾及视频地理映射的人群密度估计方法。首先,通过迁移学习的方式,构建了适合于复杂场景的人群语义分割模型;将视频与GIS相结合,求解摄像机和人群场景的单应矩阵;基于人群语义分割结果及单应矩阵,可将人群多边形映射至二维地图。然后,设计等距分区和格网分区2种分区方案;根据人群语义分割结果,统计不同分区方案下各子区人群密度。最后,基于训练的各子区人群密度值,计算视域内人口总数;通过人群目标点符号均匀填充人群多边形,进行地图可视化。实验结果表明:① 本文构建的人群语义分割模型,可实现大场景下人群的高精度分割,精度为94.11%;② 将视频与GIS相结合,实现了人群的地图映射和地图可视化表达,达到了人群可定位、可量测和可空间分析的目标;③ 实现了监控视频人群的精准计数,将视域划分更多子区有利于提高计数精度。与基于密度图的人群密度估计方法相比,本文方法在高空、高密集度的大场景中具有优势,能有效解决人群难以精准计数及地图可视化的问题,可用于大型活动、车站、商场、运动场馆的人群监管。

关键词: 地理视频, 视频GIS, 人群, 语义分割, 单应矩阵, 地图可视化, 人群密度, 人群计数

Abstract:

Aiming at the problem that the existing crowd counting methods cannot achieve accurate counting and map visualization of complex crowds, a crowd density estimation method considering video geographic mapping is proposed. Firstly, based on Deeplab V3+model, a crowd semantic segmentation model suitable for complex scenarios is constructed by transfer learning. Combining video with GIS, the high-precision homography matrix between video and crowd scene map is calculated according to four or more pixel coordinates between video frame and the corresponding geographic coordinates. Based on the crowd semantic segmentation model and the solved homography matrix, the crowd areas in videos are projected to the map. Secondly, to improve the accuracy of crowd number, two different partition schemes: equidistant and grid partition, are designed to divide the camera Field of View (FOV). According to the semantic segmentation result, the crowd density of each sub-region using different partition schemes is counted. Based on the crowd density and area of each sub-region, the total number in the camera FOV is calculated. Thirdly, based on the solved homography matrix, the semantic segmentation result of the crowd in the real-time video can be projected to the 2D map and the crowd number can be counted through the crowd density. In order to obtain accurate crowd density, we took a playground as the experimental area and collected multiple crowd surveillance videos at different times and under different crowd conditions. The experimental results show that: (1) the crowd semantic segmentation model constructed in this paper can achieve high-precision crowd segmentation in large scenes, with an accuracy of 94.11%; (2) Combining video with GIS, the polygon area of the crowd was filled through the point symbol of person style, the crowd mapping and visual expression were realized, and the goal of crowd localization, measurement, and spatial analysis was achieved; (3) Accurate counting of surveillance video crowd was realized, and the camera FOV was divided into many sub-areas, which is conducive to improving the crowd counting accuracy. Compared to the crowd density estimation method based on density map, the method proposed in this paper is suitable for large scenes with high altitude and high density, especially in the areas where the texture of people's head isn't clear and crowd characteristics are obscured. Our method can effectively improve the accuracy of crowd counting and map visualization and can be used for crowd supervision in large-scale events, stations, shopping malls, and sports venues.

Key words: geographic video, video GIS, crowd, semantic segmentation, homography matrix, map visualization, crowd density, crowd counting