4
inconsistencies, particularly in the first bar, which were
addressed by treating them as missing values (NaNs) and
imputing appropriate values.
Correlation Analysis
A correlation heatmap (Figure 3) on the plant data illus-
trates relationships between variables, with intermedi-
ate correlations observed between P2O5 upgradation and
feed characteristics like SiO2 and CaO. These correlations
informed feature selection by highlighting variables most
predictive of recovery outcomes.
Outlier Detection
Outliers in variables such as water flow and reagent dos-
ages were identified using the Interquartile Range (IQR)
method. Extreme values, identified through both statis-
tical tests and visual inspections, were either excluded or
imputed to ensure data consistency, thus improving model
reliability. expert knowledge
Data Preprocessing and Feature Engineering
The preprocessing of the flotation dataset began with a
crucial data integration step. Integrating two datasets
with different temporal resolutions, Process Information
(PI), recorded at minutely intervals, and the Laboratory
Information (LI), recorded at intervals of two to four hours,
posed a significant challenge. Based on the SME recom-
mendation, we applied a backward-filling approach to the
laboratory data. This method assumes that each lab result
represents the condition of samples over the period leading
up to that result. For instance, if a lab result is received at 3
PM, the values at 1 PM, 2 PM, and every minute between
1 PM and 3 PM are assigned the 3 PM result, continu-
ing this until we reach the prior lab sample. This approach
ensured that each lab reading captured historical process
conditions, resulting in a cohesive time series essential for
accurate time-series analysis and predictive modeling. To
ensure the accuracy and reliability of the data preprocessing
method, the assumptions made by the SMEs were verified
Figure 3. Correlation heatmap
inconsistencies, particularly in the first bar, which were
addressed by treating them as missing values (NaNs) and
imputing appropriate values.
Correlation Analysis
A correlation heatmap (Figure 3) on the plant data illus-
trates relationships between variables, with intermedi-
ate correlations observed between P2O5 upgradation and
feed characteristics like SiO2 and CaO. These correlations
informed feature selection by highlighting variables most
predictive of recovery outcomes.
Outlier Detection
Outliers in variables such as water flow and reagent dos-
ages were identified using the Interquartile Range (IQR)
method. Extreme values, identified through both statis-
tical tests and visual inspections, were either excluded or
imputed to ensure data consistency, thus improving model
reliability. expert knowledge
Data Preprocessing and Feature Engineering
The preprocessing of the flotation dataset began with a
crucial data integration step. Integrating two datasets
with different temporal resolutions, Process Information
(PI), recorded at minutely intervals, and the Laboratory
Information (LI), recorded at intervals of two to four hours,
posed a significant challenge. Based on the SME recom-
mendation, we applied a backward-filling approach to the
laboratory data. This method assumes that each lab result
represents the condition of samples over the period leading
up to that result. For instance, if a lab result is received at 3
PM, the values at 1 PM, 2 PM, and every minute between
1 PM and 3 PM are assigned the 3 PM result, continu-
ing this until we reach the prior lab sample. This approach
ensured that each lab reading captured historical process
conditions, resulting in a cohesive time series essential for
accurate time-series analysis and predictive modeling. To
ensure the accuracy and reliability of the data preprocessing
method, the assumptions made by the SMEs were verified
Figure 3. Correlation heatmap