نوع مقاله : مقاله پژوهشی
نویسنده
گروه مهندسی معدن، دانشگاه صنعتی بیرجند
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسنده [English]
Geochemical data are imbalanced in nature (i.e., the number of samples with low grade or background class are high and the number of samples with high grade or anomaly class are low). Classification of this dataset will lead to create a biased model, reducing the probability of new samples belonging to classes with fewer samples, along with a decrease in the accuracy and precision of the model. In this paper, oversampling (such as SMOTE and ADASYN), undersampling (such as RUS and OSS), and hybrid-sampling (such as SMOTE-Tomek and ADASYN-CNN) algorithms have been introduced for data balancing. Also, the performance of these algorithms on the stream sediments geochemical data of Qayen sheet has been investigated by the SVM and ANN classification methods. The results show that data balancing can significantly increase the quantity of the confusion matrix metrics such as accuracy, sensitivity, specificity, precision, F-score, F-value, G-mean and AUC, by 10 to 50 percent, and reduce the error metric by about 10 percent. So that the oversampling, hybrid-sampling and undersampling algorithms have the high performance, respectively. Geochemical anomalies maps, modeled by the balancing algorithms, show that these models can increase the extent of geochemical anomalies in the study area and establish a well overlap between these anomalies and mineralized rock units. In this respect, the oversampling algorithms (SMOTE and ADASYN) and then the hybrid-sampling algorithm (ADASYN-CNN) have higher performance. Therefore, this paper proposes the use of data balancing algorithms, using oversampling algorithms and then hybrid-sampling algorithms, before to classify the exploration data.
کلیدواژهها [English]