用统计判别法剔除异常值对环境数据可能产生的损害
THE POSSIBLE DISTORTION OP ENVIRONMENTAL DATA CAUSED BY STATISITICAL PROCESS FOR DELETING ABNORMAL DATA
-
摘要: 现时在处理环境数据时,常使用统计方法剔除异常值。环境数据中的很大部分是环境中微量成分的监测数据。根据现有的知识,微量成分在环境中的浓度分布往往呈对数正态或大的正偏分布,而不是正态分布,但通常用于剔除异常值的统计方法是为正态样本设计的。因此,用统计法剔除异常值会使那些非正态样本的高值被不合理地剔除掉,从而使环境数据的准确性受到损害。本文通过土壤中微量成分的几个实例说明通常使用的格拉布斯法对这类样本如何造成不同程度的损害,提出在用统计法剔除异常值之前对样本进行分布类型检验以减少或避免此种损害的办法。Abstract: Presently, the statistical methods are often used to delete abnormal Values in the environmental data manipulation. The majority of raw environmental data are those of the microcomponents obtained in the environment monitoring. According to the present knowledge, the frequency distribution of a microcompo-nent concentration in the environment is most likely to be log-normal or extreme positive skew distribution instead of a normal distribution. However, the statistical methods usually used to delete abnormal values are designed for the samples with normal distribution. Consequently, the use of these statistical methods often tead to an accuracy reduction of the data because of the unreasonable deletion of the high values in the sample. Using some examples for the microcomponeuts in the soil, this article revealed how the usually used GRUBBS method had brought a distortron to those examples. This artical also showed that the test of normality could be used to diminish or to avoid such distortion.