XXXI International Mineral Processing Congress 2024 Proceedings/Washington, DC/Sep 29–Oct 3 853
and capturing the complex dynamics of the grinding
circuit.
Model Development and Validation
Model Segmentation
The analysis of the grinding circuit highlighted the feasibil-
ity of independently modeling the separator, given its non-
limiting design and size. This insight led to a two-segment
modeling strategy:
Dynamic Separator Model: Utilizes LSTM net-
works combined with CNN layers to predict the par-
ticle size distribution (specifically the 80% passing
size) based on operational parameters. This model
effectively captures both time-series dependencies
and spatial data, ensuring accurate predictions of the
separator’s performance.
Ball Mill Model: This linear regression model pre-
dicts the reject mass flow rate, taking into account
the time-dependent dynamics of the process. Key to
this model is the inclusion of a time lag component
that represents the delay between changes in opera-
tional setpoints (like feed rate and mill power) and
their effects on output. This component is calibrated
using the average material retention time in the
mill, allowing the model to account for the gradual
impacts of operational adjustments on the product
size distribution.
This strategic segmentation enhances the specificity and
efficacy of the analysis, ensuring a comprehensive under-
standing of each component’s impact within the circuit.
Validation and Testing
The data-based digital twin models were rigorously evalu-
ated using historical data, previously unseen during the
training phase. For this purpose, the collected data was split
into two sets: 80% for training and 20% for testing. This
split allowed for a comprehensive assessment of the models’
performance, ensuring they were tested against data reflect-
ing various operational scenarios. This validation process
was critical to ascertain the accuracy and reliability of the
models in simulating real-world scenarios and to refine
them for enhanced predictive capabilities.
Reinforcement Learning for Optimization
The development of a reinforcement learning environment
is a critical step in training a reinforcement learning agent
within a safe and controlled simulation. This environment
defines the actions and observations within the grind-
ing process, as well as a reward system to guide the agent
towards process optimization. It also includes constraints
to inform the agent of the limits of its actions, such as the
maximum reject mass flow rate that can be conveyed within
the circuit. A key aspect of this environment is the integra-
tion of a databased digital twin, modeled to replicate the
grinding circuit’s dynamics. This integration not only allows
the reinforcement learning algorithm to interact, learn, and
adapt based on simulated feedback but also ensures that
the training is grounded in a realistic representation of the
industrial process. The digital twin serves as a dynamic and
high-fidelity model, providing the reinforcement learning
agent with a rich, simulated context in which to develop its
decision-making capabilities.
Simulation Environment Development
The reinforcement learning environment was created using
Farama’s Gymnasium library (Towers, et al., 2023), repli-
cating the dynamics of the grinding circuit. This simulated
environment is essential for training the reinforcement
learning agent before deploying it in real-world systems.
Definition of Action and Observation Spaces
Both action and observation spaces in our approach are
continuous, defining the range of possible interactions and
feedback within the process.
Action Space: Includes setpoints such as fresh mate-
rial feed rate, mill dedusting fan speed, and separator
fan and cage speeds.
Observation Space: Encompasses feedback values
from the process, including measured and target
product size, mill power, feed material mixture, and
reject material flow rate.
Reward Function Design
The reward function in reinforcement learning is funda-
mental for guiding the agent’s decisions. For controlling
a dry grinding circuit, we devised a reward function seg-
mented into three objectives, each targeting a specific oper-
ational goal:
Regulation of Target Product Size: The primary goal is
to maintain particle size as close as possible to a predefined
target value. A hysteresis value (typically 1 µm) creates a tol-
erance range around the target. The reward is calculated as
1,000 times the inverse of the absolute difference between
observed and target product size, penalizing deviations out-
side the hysteresis range.
Maximizing Throughput: The second goal is to
maximize throughput while maintaining quality
at the targeted level. The reward increases linearly
Previous Page Next Page