3
focuses on analyzing, and preparing the dataset ensur-
ing it is ready for the model development phase. In the
model development phase, we create a predictive AI engine
designed to predict the quality of P2O5 based on both oper-
ational and laboratory inputs. The key steps in these phases
are:
Data Collection
Data was collected from two primary sources: Process
Information (PI) and Laboratory Information (LI). PI
provided real-time data that are collected directly from the
operation system giving a live understanding of on parame-
ters such as pH, water flow, and reagent dosages at minutely
intervals, while LI are bi-hourly lab operation conducted
manually, assays of P2O5, Silicon dioxide (SiO2), Calcium
oxide (CaO), and Particle Size Distribution (PSD). The
dataset spans from July 2022 to June 2024. It comprises
+682,000 rows and +25 columns, capturing immediate,
dynamic operational conditions and more slowly varying
chemical compositions. This approach allowed the model
to have a comprehensive understanding by integrating real-
time fluctuations with periodic, detailed lab measurements,
thus offering a complete and nuanced view of the flotation
process.
Exploratory Data Analysis (EDA)
EDA helped us understand variable distributions and
relationships, guiding feature selection and preprocessing
choices.
Key Variable Distributions
Histograms were generated for key variables, including
pH and phosphoric acid flow rates. Figure 1 illustrates
the distribution of pH levels. The peak around 4 suggests
potential over-acidification, possibly due to errors in data
readings leading to over-acidification. The peak around 7
likely results from cleaning the probe in water. Similarly,
the phosphoric acid flow histogram in Figure 2 shows data
Figure 1. pH levels
Figure 2. Phosphoric acid flow
Previous Page Next Page