854 XXXI International Mineral Processing Congress 2024 Proceedings/Washington, DC/Sep 29–Oct 3
with every added ton of throughput, penalizing any
amount exceeding the maximum feed rate.
• Reject Amount Constraint: The third objective lim-
its the maximum level of the reject amount, penal-
izing excesses to maintain material capacity within
specific plant areas, like the dynamic separator or
bucket elevator.
The cumulative reward combines these components, align-
ing them with the specific requirements of the grinding
circuit. This alignment drives the reinforcement learning
algorithm towards solutions that are efficient and compli-
ant with operational constraints, mirroring the intended
control strategies for the system.
Training and Evaluating Reinforcement Learning
Algorithms
In this study, we utilized the open-source reinforcement
learning library Stable-Baselines 3 (Raffin, et al., 2021).
This library simplifies the implementation and testing of
various reinforcement learning setups, providing a robust
platform for our experiments.
Algorithm Selection and Parameterization
A critical step was the selection of reinforcement learn-
ing algorithms suitable for grinding circuit control. We
evaluated both on-policy algorithms like Proximal Policy
Optimization (PPO) and Actor-Critic (A2C), and off-
policy algorithms including Soft Actor-Critic (SAC), Twin
Delayed DDPG (TD3), and Deep Deterministic Policy
Gradient (DDPG). This diverse range enabled a compre-
hensive comparison to determine the most effective algo-
rithms for navigating the continuous action space and
addressing the complexities of the circuit.
Reinforcement Learning Training Process
Algorithms underwent training within a digital twin
environment, facing various simulated scenarios to refine
strategies and optimize operational setpoints. The iterative
learning process was guided by a reward function designed
to align with key operational goals: regulating particle size,
maximizing throughput, and controlling reject rates. This
approach ensured that algorithms could adapt effectively to
the dynamic conditions of the grinding process.
Evaluation of Learning Performance
The efficacy of each algorithm was assessed by its ability
to meet control objectives within operational constraints.
Performance evaluations focused on stability, efficiency,
and adaptability to changes, using these metrics to identify
strengths and weaknesses. Insights from this comprehensive
evaluation aided in selecting the most suitable algorithms
for practical deployment in the grinding circuit.
Further Evaluation in a Simulated Industrial
Environment
This phase involved developing a sophisticated simulation
environment based on virtual replicas of industrial-scale
plants to replicate real-time operational conditions. The
core objective was to integrate the refined reinforcement
learning models into this environment, enabling their
interaction and adaptation to various operational scenarios.
The efficacy of the control strategies was rigorously tested
by monitoring key performance indicators such as energy
consumption, product quality, and operational stability.
This evaluation phase was crucial for verifying the robust-
ness and adaptability of the control strategies under diverse
and changing conditions, preparing the models for real-
world deployment.
RESULTS AND DISCUSSION
Digital Twin Model Predictions
The development and evaluation of digital twin models
formed the cornerstone of this research, as detailed in our
methodology. The process involved a systematic and itera-
tive approach to model generation and selection. Two criti-
cal models were developed: one predicting the product size
of the grind and the other estimating the reject rate. The
selection of these models for the reinforcement learning
phase was based on their performance in test datasets, par-
ticularly their ability to generalize predictions accurately.
To evaluate the efficacy of these models, a compara-
tive analysis was performed. Predictions from the digital
twin models were compared against actual measurements
from a 1,000-minute operational period that included sev-
eral recipe changes. This period was chosen deliberately to
challenge the models under varying operational conditions,
thus providing a comprehensive assessment of their predic-
tive capabilities.
Figure 2 and Figure 3 illustrate a side-by-side compari-
son of the predicted values against the actual data for prod-
uct size and reject rate. Despite minor discrepancies noted
in the product size predictions over extended periods, the
models demonstrated high accuracy for short-term predic-
tions. This finding is particularly significant as it highlights
the models’ utility for real-time control applications, align-
ing with the goals set forth in the introduction and meth-
odology chapters.
with every added ton of throughput, penalizing any
amount exceeding the maximum feed rate.
• Reject Amount Constraint: The third objective lim-
its the maximum level of the reject amount, penal-
izing excesses to maintain material capacity within
specific plant areas, like the dynamic separator or
bucket elevator.
The cumulative reward combines these components, align-
ing them with the specific requirements of the grinding
circuit. This alignment drives the reinforcement learning
algorithm towards solutions that are efficient and compli-
ant with operational constraints, mirroring the intended
control strategies for the system.
Training and Evaluating Reinforcement Learning
Algorithms
In this study, we utilized the open-source reinforcement
learning library Stable-Baselines 3 (Raffin, et al., 2021).
This library simplifies the implementation and testing of
various reinforcement learning setups, providing a robust
platform for our experiments.
Algorithm Selection and Parameterization
A critical step was the selection of reinforcement learn-
ing algorithms suitable for grinding circuit control. We
evaluated both on-policy algorithms like Proximal Policy
Optimization (PPO) and Actor-Critic (A2C), and off-
policy algorithms including Soft Actor-Critic (SAC), Twin
Delayed DDPG (TD3), and Deep Deterministic Policy
Gradient (DDPG). This diverse range enabled a compre-
hensive comparison to determine the most effective algo-
rithms for navigating the continuous action space and
addressing the complexities of the circuit.
Reinforcement Learning Training Process
Algorithms underwent training within a digital twin
environment, facing various simulated scenarios to refine
strategies and optimize operational setpoints. The iterative
learning process was guided by a reward function designed
to align with key operational goals: regulating particle size,
maximizing throughput, and controlling reject rates. This
approach ensured that algorithms could adapt effectively to
the dynamic conditions of the grinding process.
Evaluation of Learning Performance
The efficacy of each algorithm was assessed by its ability
to meet control objectives within operational constraints.
Performance evaluations focused on stability, efficiency,
and adaptability to changes, using these metrics to identify
strengths and weaknesses. Insights from this comprehensive
evaluation aided in selecting the most suitable algorithms
for practical deployment in the grinding circuit.
Further Evaluation in a Simulated Industrial
Environment
This phase involved developing a sophisticated simulation
environment based on virtual replicas of industrial-scale
plants to replicate real-time operational conditions. The
core objective was to integrate the refined reinforcement
learning models into this environment, enabling their
interaction and adaptation to various operational scenarios.
The efficacy of the control strategies was rigorously tested
by monitoring key performance indicators such as energy
consumption, product quality, and operational stability.
This evaluation phase was crucial for verifying the robust-
ness and adaptability of the control strategies under diverse
and changing conditions, preparing the models for real-
world deployment.
RESULTS AND DISCUSSION
Digital Twin Model Predictions
The development and evaluation of digital twin models
formed the cornerstone of this research, as detailed in our
methodology. The process involved a systematic and itera-
tive approach to model generation and selection. Two criti-
cal models were developed: one predicting the product size
of the grind and the other estimating the reject rate. The
selection of these models for the reinforcement learning
phase was based on their performance in test datasets, par-
ticularly their ability to generalize predictions accurately.
To evaluate the efficacy of these models, a compara-
tive analysis was performed. Predictions from the digital
twin models were compared against actual measurements
from a 1,000-minute operational period that included sev-
eral recipe changes. This period was chosen deliberately to
challenge the models under varying operational conditions,
thus providing a comprehensive assessment of their predic-
tive capabilities.
Figure 2 and Figure 3 illustrate a side-by-side compari-
son of the predicted values against the actual data for prod-
uct size and reject rate. Despite minor discrepancies noted
in the product size predictions over extended periods, the
models demonstrated high accuracy for short-term predic-
tions. This finding is particularly significant as it highlights
the models’ utility for real-time control applications, align-
ing with the goals set forth in the introduction and meth-
odology chapters.