3
The PCA technique is performed on each database
section using the PCA function in the scikit learn library
in Python. The number of principal components (PCs)
required is determined using the percentage variability each
PC captures, and the number of PCs which cumulatively
capture greater than eighty percent of the data’s variability
is chosen.
The k-means clustering function requires the num-
ber of k-means clusters the analyst wants as an input. The
number of suitable k-means clusters for the dataset is deter-
mined using the elbow method. The within-cluster sum of
square (WCSS) vs number of clusters curve is plotted for
each section of the database (Figure 2). The number of clus-
ters after which the curve becomes linear is chosen, which
is eight in this case (Figure 2). The PCs data frame, which
is output from the PCA function, is fed into the k-means
function and the number of clusters is specified to obtain
the clustered database. The function appends a column to
the database and assigns a numerical value to all the rows
in one cluster. The clustered data frames for all the different
sections are concatenated into one database.
Figure 1. Plotting CuSum intervals on mineral CuSum
curves, a) higher sensitivity CuSum intervals overlayed on
copper percentage CuSum curve, b) lower sensitivity CuSum
intervals overlayed on iron percentage CuSum curve, c)
overall CuSum intervals overlayed on copper, iron, sulfur,
and zinc normalized CuSum curves
Table 1. Combination of columns with null values in their
rows. False represents when the column’s rows do not contain
null values, and True is when they have null values. Each
combination is clustered separately.
Copper
Grade is Null
Iron Grade is
Null
Sulfur Grade
is Null
Zinc Grade is
Null
False False False False
False True False False
False True False True
True False False False
Figure 2. The elbow method to determine the number of
suitable k-means clusters, 8 in this case
Previous Page Next Page