3
NIR Spectral Data Acquisition
Data acquisition commenced with scanning of coal/stone
dust samples using the StellarNet NIR ADK device. A
total of 400 samples were scanned for analysis: 100 coal
samples and 300 coal/stone dust samples. Spectral record-
ing was undertaken using the instrument’s interface. All the
samples subjected to scanning were systematically catego-
rized into five distinct groups, as summarized in Table 2.
The raw spectra of the scanned samples are presented in
Figure 2. The spectral data was then pre-processed using
a combination of MATLAB and Python for the subse-
quent machine learning analysis. Note that the spectra of
the 100 coal samples were removed as they were not used
for model development. The data pre-processing process
included adjustments of wavelength range, scatter correc-
tion [7], smoothing [8] and Tukey’s Fence cleaning [9] of
raw spectral data. The pre-processed spectra are presented
in Figure 3.
Table 2. Categories of Coal and Coal/Stone Dust Samples
Category Internal Actual TIC Number of Samples
1 70% 28
2 70%–80% 116
3 80%–85% 78
4 85% 78
Coal Original Coal 100
MODEL DEVELOPMENT AND
EVALUATION
Machine learning techniques were employed to evaluate
the data after the completion of the data pre-processing
phase towards the development and evaluation of the mod-
els, which included both the regression and classification
models. The spectra dataset underwent a random partition-
ing into two distinct subsets: a training set, comprising
227 spectra (representing 90% of the data), and a test set,
encompassing 25 spectra (equivalent to 10% of the data).
The training set served as the foundation for the construc-
tion and refinement of machine learning models through
the utilization of MATLAB R2022b. Subsequently, the test
set was employed to assess the performance and generaliza-
tion capabilities of the developed models.
Regression Models
A comprehensive assessment of various regression models
was conducted, including Linear, Support Vector Machine
(SVM), Gaussian Process Regression (GPR), and Neural
Network (NN) with the objective of finding the most
Figure 1. TIC Target by Region
Previous Page Next Page