The comparison of appropriate methods in imputation of the censored values in the geochemical datasets

Document Type : Research Article

Authors

1 Simulation and Data Processing Laboratory, ِDept. of Mining Engineering, University of Tehran

2 Dept. of Mathematics‎, ‎Statistics and Computer Science‎, ‎University of Tehran

10.17383/S2251-6565(15)940916-X

Abstract

This study deals with the imputation methods of censored values in the multivariable geochemical data. Presence of the missing values causes limitation in the use of most of statistical methods, e.g. principle component analysis. Excluding the samples which include missing values bias the results and leads to the loss of information. Due to this, consideration of an appropriate approach to deal with missing values is necessary in the analysis of incomplete datasets. In this paper considering the nature of geochemical data, various approaches for imputing the missing values, which have been suggested in the recent years and are easy to be used in the R statistic software, are introduced. Finally, using the complete dataset of the Zafarghand region, these methods are compared with each other. Results show that the application of the multivariable methods in the imputation and particularly the ilr-EM method is preferable to the other methods.

Keywords

Main Subjects


[1] Rubin, D. B., & Little, R. J. (2002). Statistical analysis with missing data. Hoboken, NJ: J Wiley & Sons.
[2] Buccianti, A., & Grunsky, E. (2014). Compositional data analysis in geochemistry: Are we sure to see what really occurs during natural processes?. Journal of Geochemical Exploration, 141, 1-5.
[3] Aitchison, J. (1986). The Statistical Analysis of Compositional Data. New York: Chapman and Hall. 416p.
[4] Hron, K., Templ, M., & Filzmoser, P. (2010). Imputation of missing values for compositional data using classical and robust methods. Computational Statistics & Data Analysis, 54(12), 3095-3107.
[5] de Caritat, P., & Grunsky, E. C. (2013). Defining element associations and inferring geological processes from total element concentrations in Australian catchment outlet sediments: multivariate analysis of continental-scale geochemical data. Applied Geochemistry, 33, 104-126.
[6] Carranza, E. J. M. (2011). Analysis and mapping of geochemical anomalies using logratio-transformed stream sediment data with censored values. Journal of Geochemical Exploration, 110(2), 167-185.
[7] Palarea-Albaladejo, J., & Martín-Fernández, J. A. (2013). Values below detection limit in compositional chemical data. Analytica chimica acta, 764, 32-43.
[8] Palarea-Albaladejo, J., & Martín-Fernández, J. A. (2008). A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences, 34(8), 902-917.
[9] Martín-Fernández, J. A., Hron, K., Templ, M., Filzmoser, P., & Palarea-Albaladejo, J. (2012). Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Computational Statistics & Data Analysis, 56(9), 2688-2704.
[10] Palarea-Albaladejo, J., Martín-Fernández, J. A., & Buccianti, A. (2014). Compositional methods for estimating elemental concentrations below the limit of detection in practice using R. Journal of Geochemical Exploration, 141, 71-77.
[11] Helsel, D. R. (2011). Statistics for censored environmental data using Minitab and R (Vol. 77). John Wiley & Sons.
[12] Martín-Fernández, J. A., Barceló-Vidal, C., & Pawlowsky-Glahn, V. (2003). Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology, 35(3), 253-278.
[13] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., ... & Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.
[14] Sadeghian, M., & Ghaffary, M. (2011). The Petrogenesis of Zafarghand Granitoid Pluton (Se of Ardestan). Petrology, (6), 47-70. (In Persian)