耕地质量评价缺失数据填充方法研究

陈宇; 周悟; 胡月明; 谢健文

文章摘要

耕地质量评价缺失数据填充方法研究

Research on filling methods of missing data in cultivated land quality evaluation

投稿时间：2021-04-06

DOI：10.13254/j.jare.2021.0201

中文关键词: 耕地质量评价，缺失，数据，填充，从化区，精度

英文关键词: evaluation of cultivated land quality, missing, data, filling, Conghua District, accuracy

基金项目:国家重点研发计划课题（2020YFD1100204）；国家自然科学基金项目（U1901601）

作者	单位	E-mail
陈宇	华南农业大学资源环境学院, 广州 510642
周悟	华南农业大学资源环境学院, 广州 510642
胡月明	华南农业大学资源环境学院, 广州 510642 广东省土地信息工程技术研究中心, 广州 510642 广东省土地利用与整治重点实验室, 广州 510642 自然资源部建设用地再开发重点实验室, 广州 510642 青海大学农牧学院, 西宁 810016 青海-广东自然资源监测与评价联合重点实验室, 西宁 810016	yueminghugis@163.com
谢健文	华南农业大学资源环境学院, 广州 510642 广东省土地信息工程技术研究中心, 广州 510642 广东省土地利用与整治重点实验室, 广州 510642 自然资源部建设用地再开发重点实验室, 广州 510642 青海-广东自然资源监测与评价联合重点实验室, 西宁 810016

摘要点击次数: 1723

全文下载次数: 1818

中文摘要:

在耕地质量数据调查与采集过程中会由于人为、环境等因素造成数据缺失，而目前数据缺失填充方法都存在适用性不足的问题，为完善耕地质量数据库从而提高耕地质量评价精度，对耕地质量评价缺失数据填充方法的研究是十分重要的。本研究以广州市从化区耕地质量数据库为样本集，根据空间相关性和空间分布将数据集划分为空间关联性数据集和非空间关联性数据集，利用多种填充方法对其进行缺失填充模拟，采用十字交叉法进行精度验证。结果表明：选取数据整体异常值比例不足1.2%，且高程、气温、有效锌等25组因素具有空间相关性。对空间关联性数据填充精度最高的是四象最近邻算法，在缺失率20%以下时精度仍高达80%，精度随缺失率增大而降低，其次为K最邻近（KNN）算法、期望最大化法、多重填充法、回归模型算法，四象最近邻算法相较于KNN算法在数据密集时精度更好。对非空间关联性数据填充精度最高的是相似聚集填充算法，在缺失率25%以下时精度超过80%，其次为期望最大化法、多重填充法、回归模型算法。综上，本研究提出的四象最近邻算法和相似聚集填充算法相比其他算法在耕地质量评价缺失数据填充中精度更高，效果更稳定，且实用性更广。

英文摘要:

In the process of cultivated land quality data investigation and collection, there will be missing data due to human, environmental, and other factors. However, the current missing data-filling methods have insufficient applicability. In order to improve the cultivated land quality database and evaluation accuracy, it is important to explore missing data-filling methods in cultivated land quality evaluation. In this study, the cultivated land quality database of Conghua District Guangzhou City was used as the sample set. According to the spatial correlation and spatial distribution, the dataset was divided into spatial and non-spatial correlation datasets. Various filling methods were used to simulate the missing data filling, and a cross method was used to verify the accuracy. The results indicated the proportion of total outliers was less than 1.2%, and 25 factors such as elevation, temperature, and available zinc showed spatial correlation. The four-image nearest neighbor algorithm presented the highest filling accuracy for spatial association data, and the accuracy was as high as 80% when the missing rate was less than 20%. The accuracy decreased with the increase in the missing rate. The four-image nearest neighbor algorithm was followed by K-nearest neighbor algorithm(KNN), expectation maximization algorithm, multiple interpolation algorithm, and regression model algorithm. The four-image nearest neighbor algorithm showed better accuracy than K-nearest neighbor algorithm when the data was dense. For the non-spatial correlation dataset, the highest filling accuracy was the similar aggregation filling algorithm, which could maintain more than 80% accuracy within 25% of the missing rate, followed by expectation maximization algorithm, multiple interpolation algorithm, and regression model algorithm. To sum up, the four-image nearest neighbor algorithm and the similar aggregation filling algorithm proposed in this study show higher accuracy, more stable effect, and wider practicability than other algorithms for filling missing data in cultivated land quality evaluation.

HTML 查看全文查看/发表评论下载PDF阅读器

关闭