تحلیل خوشه‌بندی فازی داده‌های ترکیبی و مقایسه آن با دندروگرام اکتشافی داده‌های ترکیبی ژئوشیمی رسوبات آبراهه‌ای منطقه انار

نوع مقاله: مقاله پژوهشی

نویسندگان

دانشکده مهندسی معدن-دانشگاه یزد

چکیده

از روش‌های مهم در داده‌کاوی نظارت نشده داده‌های ژئوشیمیایی، انواع روش‌های خوشه‌بندی است که چنانچه روی متغیرها انجام شوند منجر به کاهش ابعاد داده‌ها می‌شوند. در میان انواع روش‌های خوشه‌بندی، نوع فازی آن به دلیل ویژگی‌های خاص منطق فازی و انعطاف بیشتر در تعیین گروه‌های داده مشابه، در سالیان اخیر بسیار مورد توجه قرار گرفته است. در این پژوهش از الگوریتم فازی منعطف به نام FANNY به منظور خوشه‌بندی متغیرهای داده‌های ژئوشیمی رسوبات آبراهه‌ای که خاصیت ترکیبی دارند، استفاده شده است. با تحقیقات گسترده محققان علم آمار و ارائه روش‌های جدید بازکردن داده‌های ترکیبی، مشخص شده است که فاصله‌ها و روابط دیگری بر فضای این نوع داده‌ها حاکم است که برای درک بهتر آنها نیاز به انتقال ایزومتریک به فضای اقلیدسی است تا قابل استفاده و تفسیر با روابط کلاسیک آماری باشند. در پژوهش حاضر، پس از آماده‌سازی داده‌های ژئوشیمی رسوبات آبراهه‌ای منطقه انار کرمان (به عنوان مثالی از داده‌های ترکیبی با ابعاد زیاد) ابتدا دندروگرام اکتشافی روی متغیرها در فضای سیمپلکس و با استفاده از پارتیشن دودوئی ترتیبی(SBP) پیش فرض، محاسبه و ترسیم شد که با بکارگیری این روش، تعداد 4 خوشه با متغیرهای مشابه شناسایی شد. سپس دوباره با استفاده از الگوریتم fanny، همان متغیرهای داده‌های باز شده با تبدیل clr خوشه‌بندی شد. نتایج خوشه‌بندی متغیرها با الگوریتم fanny انطباق قابل قبولی با دندروگرام اکتشافی داده‌های ترکیبی نشان داد. در صورتی که SBP مورد نیاز برای بالانس‌های دندروگرام اکتشافی در مختصات ایزومتریک با شناخت کامل‌تر از متغیرها و نه بصورت پیش فرض تعیین شود نتایج دندروگرام دقت بسیار بهتری خواهد داشت.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Fuzzy clustering analysis of compositional data and comparing it with exploratory compositional data dendrogram, case study: Anar region stream sediments geochemistry

نویسندگان [English]

  • Hamid Moini
  • ّFarhad Mohammadtorab
  • Majid Keykha Hosseinpour
چکیده [English]

Summary
One of the most important methods in unsupervised datamining is clustering  that when applied on variables leads to dimension reduction. Among all of them, fuzzy clustering methods are preferred because of special features and better flexibility in partitioning groups.  In this study, FANNY algorithm proposed by Kauffmann and Rousseuw has been applied in variable clustering of the  geochemical stream sediments that have a compositional nature. Referring to the extensive recent researches and novel methods presented in opening  compositional data, another definition of distance is needed for them to be transformed isometrically to the euclidean space to be interpretable with classical operations. In this case study after preparation of geochemical stream sediments data of Anar region in Kerman, first the exploratory dendrogram of the simplex space was plotted and 4 clusters were obtained. Then using fanny algorithm, clr-transformed variables were clustered. It showed an acceptable conformity with the dendrogram results. In case of determining the balances of SBP manually instead of default and with a prior knowledge, the results of exploratory dendrogram would be more precise.   
 
Introduction
Geochemical exploration based on stream sediment analysis, is one of the most important methods in assessing mineral potentials in prospecting brownfield areas. Different statistical methods have been developed to identify the pattern of groups of associated geochemical elements in the last decades. In this research, stream sediments data clustering of Anar exploratory region have been analyzed with a particular perspective of the closed nature of geochemical datasets using two known methods, fuzzy clustering and exploratory dendrogram.
 
Methodology and Approaches
First, using R software compositions-package, exploratory dendrogram of compositional data was calculated and plotted based on ward criterion and default sequential binary partition balances in simplex space. Due to applying this method, 4 clusters were detected. Then by applying  fanny algorithm (cluster package) –one of the most flexible ones in fuzzy clusterings –on clr- transformed data, 4 clusters with the best silhouette were determined. The fuzzification degree was selected in a way that would be near to crisper methods like dendrogram in order to compare the results.  
 
Results and Conclusions
Although different methods applied on transformed compositional data, their similar results showed very good conformity with lithology and geological structures. It presented a good separation in simplex space. If the balances in SBP are to be defined manually, the reduced dimensions of the variables would be more informative.

کلیدواژه‌ها [English]

  • Stream sediments
  • Anar region
  • FANNY algorithm
  • Compositional data (CoDa) dendrogram
  • Aitchison distance
  • clr-transform
[1] Hassani Pak, A. A., Sharafuddin, M. (1384). Exploration data Analysis. University of Tehran Press. in persian

[2] Carranza, E. J. M. (2008). Geochemical anomaly and mineral prospectivity mapping in GIS (Vol. 11): Elsevier.

[3] Reimann, C., Filzmoser, P., & Garrett, R. G. (2002). Factor analysis applied to regional geochemical data: problems and possibilities. Applied geochemistry, 17(3), 185-206.

[4] Bezdek J.C., Ehrlich R.R., Full W., (1984). FCM: the fuzzy c-means clustering algorithm. Comput. Geosci., 10:191-203.

[5] Gordon, A. D.; (1999); “Classification”, 2nd Edition, Chapman and Hall, Boca Raton.

[6] Bochang, Y., & Xuejing, X. (1985). Fuzzy cluster analysis in geochemical exploration. Journal of Geochemical Exploration, 23(3), 281-291.‏

[7] Grekousis, G., & Thomas, H. (2012). Comparison of two fuzzy algorithms in geodemographic segmentation analysis: The Fuzzy C-Means and Gustafson–Kessel methods. Applied Geography, 34, 125-136.‏

[8] De Carvalho, F. D. A., & Tenório, C. P. (2010). Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances. Fuzzy Sets and Systems, 161(23), 2978-2999.‏

[9] Ziaii, M., Pouyan, A. A., & Ziaei, M. (2009). Neuro-fuzzy modelling in mining geochemistry: Identification of geochemical anomalies. Journal of Geochemical Exploration, 100(1), 25-36.‏

[10] Templ, M., Filzmoser, P., & Reimann, C. (2008). Cluster analysis applied to regional geochemical data: problems and possibilities. Applied Geochemistry, 23(8), 2198-2213.‏

[11] Reimann, C., Filzmoser, P., & Garrett, R. G. (2005). Background and threshold: critical comparison of methods of determination. Science of the Total Environment, 346(1), 1-16.

[12] Chork, C., & Govett, G. (1985). Comparison of interpretations of geochemical soil data by some multivariate statistical methods, Key Anacon, NB, Canada. Journal of Geochemical Exploration, 23(3), 213-242.

[13] Basilevsky, A. (1994). Statistical factor analysis and related methods: theory and applications. Wiley series in probability and mathematical statistics, 737.

[14] Chork, C., & Salminen, R. (1993). Interpreting exploration geochemical data from Outokumpu, Finland: a MVE-robust factor analysis. Journal of Geochemical Exploration, 48(1), 1-20.

[15] Treiblmaier, H., & Filzmoser, P. (2010). Exploratory factor analysis revisited: How robust methods support the detection of hidden multivariate data structures in IS research. Information & management, 47(4), 197-207.

[16] Carranza, E. J. M. (2011). Analysis and mapping of geochemical anomalies using logratio-transformed stream sediment data with censored values. Journal of Geochemical Exploration, 110(2), 167-185.

[17] Filzmoser, P., Hron, K., & Reimann, C. (2009). Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Science of the Total Environment, 407(23), 6100-6108.

[18] Wenlei.W., Zhao, J., & Cheng, Q. (2013). Fault trace-oriented singularity mapping technique to characterize anisotropic geochemical signatures in Gejiu mineral district, China. Journal of Geochemical Exploration, 134, 27-37.

[19] Aitchison, J. (1981). A new approach to null correlations of proportions. Journal of the International Association for Mathematical Geology, 13(2), 175-189.

[20] Aitchison, J. (1983). Principal component analysis of compositional data. Biometrika, 70(1), 57-65.

[21] Aitchison, J. (1984). The statistical analysis of geochemical compositions. Journal of the International Association for Mathematical Geology, 16(6), 531-564.

[22] Aitchison, J. (1986). The statistical analysis of compositional data (Vol. 25): Chapman & Hall.

[23] Aitchison, J. (1999). Logratios and natural laws in compositional data analysis. Mathematical Geology, 31(5), 563-580.

[24] Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J., & Pawlowsky-Glahn, V. (2000). Logratio analysis and compositional distance. Mathematical Geology, 32(3), 271-275.

[25] Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., & Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279-300.

[26] Egozcue, J. J., & Pawlowsky-Glahn, V. (2005). Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37(7), 795-828.

[27] Buccianti, A., & Pawlowsky-Glahn, V. (2005). New perspectives on water chemistry and compositional data analysis. Mathematical Geology, 37(7), 703-727.

[28] Buccianti, A., Mateu-Figueras, G., & Pawlowsky-Glahn, V. (2006). Compositional data analysis in the geosciences: from theory to practice.

[29] Thió-Henestrosa, S., & Martín-Fernández, J. (2005). Dealing with compositional data: the freeware CoDaPack. Mathematical Geology, 37(7), 773-793.

[30] Geological survey of Iran,. (2016). Final report of BLEG geochemical exploration in Anar and Yazd 1:250,000 geological sheets. Tehran. in persian

[31] Anar 1:250,000 geological quadrangle map (1981). Geological Survey of Iran. in persian

[32] Aghanabati, A. (2004). Geology of Iran. Geological Survey of Iran. in persian

[33] Van den Boogaart, K. G., & Tolosana-Delgado, R. (2013). Analyzing compositional data with R. Berlin: Springer.

[34] Egozcue, J. J., & Pawlowsky-Glahn, V. (2011). Basic concepts and procedures. Compositional Data Analysis: Theory and Applications, 12-28.

[35] Pawlowsky-Glahn, V., & Egozcue, J. J. (2006). Compositional data and their analysis: an introduction. Geological Society, London, Special Publications, 264(1), 1-10.

[36] Hassani Pak, A. A. (1384). Principles of Geochemical Explorations. University of Tehran Press. in persian

[37] Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

[38] Palarea-Albaladejo, J., Martín-Fernández, J. A., & Soto, J. A. (2012). Dealing with distances and transformations for fuzzy C-means clustering of compositional data. Journal of classification, 29(2), 144-169.

[39] Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65.

[40] Palarea-Albaladejo, J., & Martin-Fernandez, J. A. (2014). zCompositions-package.

[41] Van den Boogaart, K. G., Tolosana, R., & Bren, M. (2011). compositions: Compositional Data Analysis. R package version 1.10-2. URL http://CRAN. R-project. org/package5compositions.