1450 XXXI International Mineral Processing Congress 2024 Proceedings/Washington, DC/Sep 29–Oct 3
Ensemble learning is a powerful method of training mul-
tiple models on different parts of the data set. The approach
takes advantage of the wisdom of the crowd where the aver-
age of multiple models tends to be more accurate than any
single model. The technique also helps to minimize the
risk of over-fitting by training multiple models on different
parts of the data. The Cubist model commonly performs
well with non-linear and categorical geometallurgical data.
Figure 5 is a comparison of actual DWi vs predicted DWi
for the LR and the Cubist model.
We generated LR and Cubist predictive models in an
iterative procedure, increasing the number of samples and
comparing the improvement in the models, using the RMSE
regression metric. The samples were drawn randomly from
the synthetic data set. Samples were increased in unit steps
from 10 to 100, and steps of 10 from 100 to 500.
Figure 6 shows the RMSE for the LR and Cubist mod-
els with sample numbers increasing from 1 to 500. Both
models improve rapidly before their performance stabi-
lizes. This transition is especially sharp for the LR model,
where the RMSE stabilizes at 2.3 and there is no signifi-
cant improvement for more than 100 samples. The Cubist
model continues to improve beyond 100 samples, asymp-
totically approaching an RMSE of approximately 1.4. This
simple comparison shows that for this mineral deposit:
Increasing the number of drop weight tests beyond
100 does not add value to the LR model. If LR is the
selected regression modelling method, further drop
weight testing is unnecessary.
The Cubist model performs consistently better than
the LR model. A set of 100 drop weight tests produces
an RMSE of about 2.1. If this is an acceptable level
of error, further testing is unnecessary. Improvement
of the Cubist model is possible by carrying out more
than 100 drop weight tests but the benefits, mea-
sured by RMSE, steadily diminish.
The test demonstrates the principle that changes in
RMSE as the number of samples is increased can be used
to identify a threshold at which either the error of the pre-
dictive model is sufficiently low as to be fit for purpose,
or further testing will not significantly improve the predic-
tions. This threshold is data-driven and not reliant upon
rules of thumb or a priori assumptions. The threshold will
change depending on the processing response that is being
predicted, the modelling algorithms that are being used
and the number of variables/features that are used by the
models.
CONCLUSIONS
Well-designed metallurgical sampling programmes start
with clarity about the objectives of the test work. Applying
geometallurgical principles to the sample selection ensures
that maximum leverage is obtained from the metallurgical
testwork. Data science and machine learning provide tools
for the assessment of multivariate data that aid interpreta-
tion of relationships and avoid the limitations of univariate
and bivariate statistical methods.
The trial outlined in the paper demonstrates that the
impact of increasing sample numbers on the accuracy
of linear regression and Cubist algorithms can be tested
and the RMSE measured. For the target variable tested,
drop weight index, both models clearly show a minimum
Figure 5. Model predictive performance for DWi for the LR and Cubist models
Previous Page Next Page