XXXI International Mineral Processing Congress 2024 Proceedings/Washington, DC/Sep 29–Oct 3 895
variables were collected, covering about 6 years historical
data. Additional characteristic nature of the variables has
been presented in our previous publication (Amankwaa-
Kyeremeh et al., 2023 Amankwaa-Kyeremeh et al.,
2024 Amankwaa-Kyeremeh et al., 2021d). Transient
operation observations were flagged as outliers and deleted
before further pre-processing. The further preprocessing of
the data set was based on domain knowledge of the steady
operating bound of each rougher flotation variable. The
overall pre-processing of the data set resulted in 1325270
useful observations for the analysis. For the purpose of data
confidentiality, standardised data, using zscore transforma-
tion, was used in model development throughout this work
(Amankwaa-Kyeremeh et al., 2020a, b). Figure 1 shows
the scatter plot of the input and output variables, revealing
characteristics features of the variables and their relation-
ship with other input variables and the output variable.
Model development
Support vector machines, Gaussian process regression,
multi-layer perceptron artificial neural network, linear
regression and random forest were used to develop models
between the input and output variable(s). 30000 observa-
tions were randomly sub-sampled as the optimum data
size while monitoring the computational time and training
error (Figure 2). As clearly shown in Table 3, going beyond
30000 observations, using linear regression algorithm,
only increases the computational cost with no significant
improvement in model performance.
Hold-out cross validation approach was used to ran-
domly divide the sampled data set into 80% (24000
observations) training data set and 20% (6000 obser-
vations) validation data set. The outstanding 1295270
observations were used for model testing. Details on the
implementation of the SVM, GPR, ANN, RF and LR
algorithms in MATLAB R2020 are provided in our previ-
ous article (Amankwaa-Kyeremeh et al., 2023 Amankwaa-
Kyeremeh et al., 2024 Amankwaa-Kyeremeh et al., 2021c
Amankwaa-Kyeremeh et al., 2021d).
Model performance assessment
The performance of the predictive models was evaluated
using four assessment indicators. These indicators are
shown in equations 1–4, and include correlation coefficient
(𝑟), root mean square error (RMSE), mean absolute per-
centage error (MAPE) and variance accounted for (VAF).
95% confidence interval was used for the data, predictions
and performance assessment indicators.
r
y y
y y
i i i i
i l i
n
2 2
1 1
1
#
=
--
--
==
=
^y
^y _y
_y h
h- i
i /n /n
/
y n
1 RMSE MSE
i
n
i
2
1
==-
=
`_y j i /
n yi
y y 1 100% MAPE i i
i
n
1
#=
-
=
`j /
var^y
var_y y
100% VAF
i
i i =-
-e1
h
i o#
where
y
i =ith true rougher copper recovery value
y =mean of true rougher copper recovery values
y
i =ith predicted rougher copper recovery value
y =mean of predicted rougher copper recovery values
n =total number of observations
Table 1. Motivations for the selected predictive algorithms
Algorithms Advantages Reference
SVM SVM is memory efficient, versatile and effective in handling
high dimensional spaces even when the number of dimensions
is greater than the number of samples.
(Bhavsar and Panchal,
2012)
GPR GPR is a non-parametric approach that can cater for
uncertainties in a data and can quantitively model the existing
noise in the measured data.
GPR has proven a satisfactory performance in predicting
rougher copper recovery in our previous work.
(Arthur et al., 2020)
(Amankwaa-Kyeremeh
et al., 2021d)
ANN ANN has seen much success in its application for the
modelling and control of copper flotation and many other
fields of mineral processing.
(Lee et al., 1991)
LR LR models are easy to implement, interpret and train. (Grömping, 2006)
RF Attractive features in RF include small number of tuneable
hyperparameters and general resistance to overfitting.
(Breiman and Cutler,
2003)
Previous Page Next Page