耕地质量评价缺失数据填充方法研究

陈宇; 周悟; 胡月明

文章摘要

耕地质量评价缺失数据填充方法研究

Research on filling method of missing data in cultivated land quality evaluation

投稿时间：2021-04-06 修订日期：2021-06-09

DOI：

中文关键词: 耕地质量评价缺失数据填充从化区精度

英文关键词: evaluation of cultivated land quality missing data filling Conghua District accuracy

基金项目:国家重点研发计划（2020YFD1100204）；国家自然科学基金（U1901601） Project supported: National Key R&D Program of China (2020YFD1100204); The National Natural Science Foundation of China (U1901601)

作者	单位	邮编
陈宇^*	华南农业大学资源环境学院	510642
周悟	华南农业大学资源环境学院
胡月明	华南农业大学资源环境学院

摘要点击次数: 524

全文下载次数: 0

中文摘要:

在耕地质量数据调查与采集过程中会由于人为、环境等因素造成数据缺失，而目前数据缺失填充方法都存在适用性不足，为完善耕地质量数据库从而提高耕地质量评价精度，所以对耕地质量评价缺失数据填充方法的研究是十分重要的。本文以从化区耕地质量数据库为样本集，根据空间相关性和空间分布将数据集划分为空间关联性和非空间关联性数据集，利用多种填充方法对其进行缺失填充模拟，采用十字交叉法进行精度验证。实验结果表明：1）本文选取数据整体异常值比例不足1.2%，并且高程、气温、有效锌等25组元素具有空间相关性。2）对空间关联数据性填充精度最高的是四象最近邻算法，在缺失率为20%以下精度都高达80%，精度高低随缺失率增大而降低，其次为K最邻近算法、期望最大化法、多重填补法、回归模型算法，四象最近邻算法对于K最邻近算法在数据密集时精度更好。3）对非空间关联性数据填充精度最高是相似聚集填充算法，并在缺失率为25%以内保持80%以上高精度，其次为期望最大化法、多重填补法、回归模型算法。综上，本文提出的两种填充方法相比其他算法在耕地质量评价缺失数据填充中表现出精度更高，效果更稳定，且实用性更广。

英文摘要:

In the process of cultivated land quality data investigation and collection, there will be data missing due to human, environmental and other factors. However, the current data missing filling methods have insufficient applicability. In order to improve the cultivated land quality database and improve the accuracy of cultivated land quality evaluation, it is very important to study the missing data filling methods of cultivated land quality evaluation. In this paper, the cultivated land quality database of Conghua district is taken as the sample set. According to the spatial correlation and spatial distribution, the data set is divided into spatial correlation and non spatial correlation data sets. A variety of filling methods are used to simulate the missing filling, and the cross method is used to verify the accuracy. The results show that: 1) the proportion of total outliers is less than 1.2%, and 25 groups of elements such as elevation, temperature and available zinc have spatial correlation. 2) The four image nearest neighbor algorithm has the highest filling accuracy for spatial association data, and the accuracy is as high as 80% when the missing rate is less than 20%. The accuracy decreases with the increase of the missing rate, followed by K nearest neighbor algorithm, expectation maximization algorithm, multiple filling algorithm and regression model algorithm. The four image nearest neighbor algorithm has better accuracy for k nearest neighbor algorithm when the data is dense. 3) For non spatial correlation data, the highest filling accuracy is similar aggregation filling algorithm, which can keep more than 80% accuracy within 25% of the missing rate, followed by expectation maximization method, multiple filling method and regression model algorithm. To sum up, the two filling methods proposed in this paper show higher accuracy, more stable effect and wider practicability than other algorithms in filling missing data of cultivated land quality evaluation.

HTML View Fulltext 查看/发表评论下载PDF阅读器

关闭