5
suitable regression model for the pre-processed spectra. The
analysis concluded that Gaussian Process Regression (GPR)
exhibited superior performance compared to other models,
and it was fine-tuned with the following hyperparameters:
the basic function was set to constant, the Kernel function
employed was an isotropic rational quadratic, the Kernel
scale was established at 0.024937, sigma was determined to
be 51.9962, and the data was standardized. GPR emerged as
the optimal choice, demonstrating notable training results,
including a Root Mean Square Error (RMSE) of 3.93%
and an R2 value of 0.65. As illustrated in Figure 4, the GPR
regression model successfully predicted the TIC values for
a set of 25 coal-stone dust mixture samples within the test
set. The evaluation yielded notable performance metrics
with R2 value of 0.60 and RMSE of 4.46%. These results
closely aligned with the performance metrics obtained dur-
ing the training phase using the train set. Such close cor-
respondence between training and test results indicates that
the model achieved a balance between accuracy and gener-
alization, suggesting that it did not suffer from overfitting
and demonstrated stability in its predictive performance.
This outcome underscores the robustness and reliability of
the GPR regression model in capturing the underlying pat-
terns in the data.
Classification Models
The analysis of classification models included Decision
Tree, Support Vector Machine (SVM), Ensemble and
Neural Network (NN) classifiers. It was determined that the
Ensemble classifier displayed the most robust performance
compared to the other models under consideration. It was
further optimized with the following hyperparameters:
employing the AdaBoost ensemble method, a maximum of
8 splits, 10 learners, and a learning rate set at 0.53538. The
Ensemble classifier emerged as the preferred choice, show-
casing notable training results, which included an accuracy
rate of 62.6%. The corresponding confusion matrix is illus-
trated in Figure 5.
The Ensemble classifier underwent further evaluation
process using an independent test set to assess the classi-
fier’s predictive performance and its ability to generalize
to previously unseen data. As shown in Figure 6, the clas-
sifier was employed to predict the categories of 25 coal/
stone dust samples within the test set, resulting in an overall
accuracy rate of 56%. It’s worth noting that this accuracy
value deviates from the performance observed during the
training phase with the train set. This discrepancy can be
attributed to the specific challenges encountered by the
Ensemble classifier, particularly in accurately classifying
Category 1 (TIC 70%) samples. Despite this variance,
it is important to recognize that the classifier’s performance
aligns with expectations, given the complexities associated
with Category 1 classification. In comparison, the classifi-
cation model demonstrated a high accuracy of 76.92% for
Category 2 (TIC =70%–80%).
Regression Versus Classification Models
Two distinct models were successfully developed based
on the machine learning technique: regression and classi-
fication. The regression model yielded promising results,
60 70 80 90 100
Total Incombustible Concentration (%)(test set)
60
65
70
75
80
85
90
95
100
Figure 4. Comparison of Predicted and Calculated Values
(Test Set)
Figure 5. Confusion Matrix for Quadratic SVM
Classification Model
Predicted
Total
Incombustible
Concentration
(%)
suitable regression model for the pre-processed spectra. The
analysis concluded that Gaussian Process Regression (GPR)
exhibited superior performance compared to other models,
and it was fine-tuned with the following hyperparameters:
the basic function was set to constant, the Kernel function
employed was an isotropic rational quadratic, the Kernel
scale was established at 0.024937, sigma was determined to
be 51.9962, and the data was standardized. GPR emerged as
the optimal choice, demonstrating notable training results,
including a Root Mean Square Error (RMSE) of 3.93%
and an R2 value of 0.65. As illustrated in Figure 4, the GPR
regression model successfully predicted the TIC values for
a set of 25 coal-stone dust mixture samples within the test
set. The evaluation yielded notable performance metrics
with R2 value of 0.60 and RMSE of 4.46%. These results
closely aligned with the performance metrics obtained dur-
ing the training phase using the train set. Such close cor-
respondence between training and test results indicates that
the model achieved a balance between accuracy and gener-
alization, suggesting that it did not suffer from overfitting
and demonstrated stability in its predictive performance.
This outcome underscores the robustness and reliability of
the GPR regression model in capturing the underlying pat-
terns in the data.
Classification Models
The analysis of classification models included Decision
Tree, Support Vector Machine (SVM), Ensemble and
Neural Network (NN) classifiers. It was determined that the
Ensemble classifier displayed the most robust performance
compared to the other models under consideration. It was
further optimized with the following hyperparameters:
employing the AdaBoost ensemble method, a maximum of
8 splits, 10 learners, and a learning rate set at 0.53538. The
Ensemble classifier emerged as the preferred choice, show-
casing notable training results, which included an accuracy
rate of 62.6%. The corresponding confusion matrix is illus-
trated in Figure 5.
The Ensemble classifier underwent further evaluation
process using an independent test set to assess the classi-
fier’s predictive performance and its ability to generalize
to previously unseen data. As shown in Figure 6, the clas-
sifier was employed to predict the categories of 25 coal/
stone dust samples within the test set, resulting in an overall
accuracy rate of 56%. It’s worth noting that this accuracy
value deviates from the performance observed during the
training phase with the train set. This discrepancy can be
attributed to the specific challenges encountered by the
Ensemble classifier, particularly in accurately classifying
Category 1 (TIC 70%) samples. Despite this variance,
it is important to recognize that the classifier’s performance
aligns with expectations, given the complexities associated
with Category 1 classification. In comparison, the classifi-
cation model demonstrated a high accuracy of 76.92% for
Category 2 (TIC =70%–80%).
Regression Versus Classification Models
Two distinct models were successfully developed based
on the machine learning technique: regression and classi-
fication. The regression model yielded promising results,
60 70 80 90 100
Total Incombustible Concentration (%)(test set)
60
65
70
75
80
85
90
95
100
Figure 4. Comparison of Predicted and Calculated Values
(Test Set)
Figure 5. Confusion Matrix for Quadratic SVM
Classification Model
Predicted
Total
Incombustible
Concentration
(%)