文章摘要
耕地质量评价缺失数据填充方法研究
Research on filling methods of missing data in cultivated land quality evaluation
Received:April 06, 2021  
DOI:10.13254/j.jare.2021.0201
中文关键词: 耕地质量评价,缺失,数据,填充,从化区,精度
英文关键词: evaluation of cultivated land quality, missing, data, filling, Conghua District, accuracy
基金项目:国家重点研发计划课题(2020YFD1100204);国家自然科学基金项目(U1901601)
Author NameAffiliationE-mail
CHEN Yu College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China  
ZHOU Wu College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China  
HU Yueming College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
Guangdong Province Engineering Research Center for Land Information Technology, Guangzhou 510642, China
Guangdong Provincial Key Laboratory of Land Use and Consolidation, Guangzhou 510642, China
Key Laboratory of the Ministry of Natural Resources for Construction Land Transformation, Guangzhou 510642, China
College of Agriculture and Animal Husbandry, Qinghai University, Xining 810016, China
Qinghai-Guangdong Joint Key Laboratory of Natural Resources Monitoring and Evaluation, Xining 810016, China 
yueminghugis@163.com 
XIE Jianwen College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
Guangdong Province Engineering Research Center for Land Information Technology, Guangzhou 510642, China
Guangdong Provincial Key Laboratory of Land Use and Consolidation, Guangzhou 510642, China
Key Laboratory of the Ministry of Natural Resources for Construction Land Transformation, Guangzhou 510642, China
Qinghai-Guangdong Joint Key Laboratory of Natural Resources Monitoring and Evaluation, Xining 810016, China 
 
Hits: 1127
Download times: 959
中文摘要:
      在耕地质量数据调查与采集过程中会由于人为、环境等因素造成数据缺失,而目前数据缺失填充方法都存在适用性不足的问题,为完善耕地质量数据库从而提高耕地质量评价精度,对耕地质量评价缺失数据填充方法的研究是十分重要的。本研究以广州市从化区耕地质量数据库为样本集,根据空间相关性和空间分布将数据集划分为空间关联性数据集和非空间关联性数据集,利用多种填充方法对其进行缺失填充模拟,采用十字交叉法进行精度验证。结果表明:选取数据整体异常值比例不足1.2%,且高程、气温、有效锌等25组因素具有空间相关性。对空间关联性数据填充精度最高的是四象最近邻算法,在缺失率20%以下时精度仍高达80%,精度随缺失率增大而降低,其次为K最邻近(KNN)算法、期望最大化法、多重填充法、回归模型算法,四象最近邻算法相较于KNN算法在数据密集时精度更好。对非空间关联性数据填充精度最高的是相似聚集填充算法,在缺失率25%以下时精度超过80%,其次为期望最大化法、多重填充法、回归模型算法。综上,本研究提出的四象最近邻算法和相似聚集填充算法相比其他算法在耕地质量评价缺失数据填充中精度更高,效果更稳定,且实用性更广。
英文摘要:
      In the process of cultivated land quality data investigation and collection, there will be missing data due to human, environmental, and other factors. However, the current missing data-filling methods have insufficient applicability. In order to improve the cultivated land quality database and evaluation accuracy, it is important to explore missing data-filling methods in cultivated land quality evaluation. In this study, the cultivated land quality database of Conghua District Guangzhou City was used as the sample set. According to the spatial correlation and spatial distribution, the dataset was divided into spatial and non-spatial correlation datasets. Various filling methods were used to simulate the missing data filling, and a cross method was used to verify the accuracy. The results indicated the proportion of total outliers was less than 1.2%, and 25 factors such as elevation, temperature, and available zinc showed spatial correlation. The four-image nearest neighbor algorithm presented the highest filling accuracy for spatial association data, and the accuracy was as high as 80% when the missing rate was less than 20%. The accuracy decreased with the increase in the missing rate. The four-image nearest neighbor algorithm was followed by K-nearest neighbor algorithm(KNN), expectation maximization algorithm, multiple interpolation algorithm, and regression model algorithm. The four-image nearest neighbor algorithm showed better accuracy than K-nearest neighbor algorithm when the data was dense. For the non-spatial correlation dataset, the highest filling accuracy was the similar aggregation filling algorithm, which could maintain more than 80% accuracy within 25% of the missing rate, followed by expectation maximization algorithm, multiple interpolation algorithm, and regression model algorithm. To sum up, the four-image nearest neighbor algorithm and the similar aggregation filling algorithm proposed in this study show higher accuracy, more stable effect, and wider practicability than other algorithms for filling missing data in cultivated land quality evaluation.
HTML   View Full Text   View/Add Comment  Download reader
Close