7
but may lose sensitively to fine detail, while smaller patches
capture finer details but may miss global patterns. The
influence of patch size on GLCM features was investigated
in this section and the results are shown in Figure 5. It is
observed from the median values that, except energy, the
GLCM features for BC and BBC show negligible influence
by patch size. However, the spread of the data, regardless of
the coal lithotype and GLCM features, decreases with the
increase in patch size, indicating more consistent texture
patterns with the increasing patch size. As mentioned ear-
lier, the selection of patch size will be a trade-off. The patch
size selection should focus on the fine detail of the images
without completely ignoring the global patterns. Also, a
side length of 30–50 mm for a sliding window was found
to obtain the best results for megascopic coal lithotype clas-
sification in a previous study (Yu et al. 1997)methods for its
characterisation and analysis are poorly developed. Banding
texture was obtained manually from the coal face and core
at a minimum resolution of 1 mm. Window filtering was
used to determine the optimum resolution (30–50 mm.
This corresponds to the patch size length of 50 pixel in
Figure 5 based on the resolution. Thus, a patch size of
50×50 pixel or mm was selected for the following analyses.
With the recommended minimum resolution in a pre-
vious study, the influence of various parameters for GLCM
calculation was investigated with the purpose of optimiz-
ing the parameters. Based on the above analyses, an angle
value of 0 degree, a distance value of 1, and a patch size of
50×50 mm was determined and were used for the follow-
ing statistical analyses in this section. Besides the GLCM
features, the common statistics, mean and standard devia-
tion, are included in the statistical analysis. When the coal
images are converted into gray, they can be treated as 2D
arrays of pixel intensity values and mean, and standard
deviation can be easily calculated.
A pairplot of the image features are shown in Figure 6.
It shows the pairwise relationship in the dataset by map-
ping different features of the dataset onto a column and
row in a grid of subplots. The data were plotted with differ-
ent colors based on the megascopic coal lithotype. Bivariate
distributions were drawn with kernel density estimation
(KDE) in the lower triangle, and pairwise scatter plots were
drawn in the upper triangle. The correlation coefficients
between each pair of features were calculated and marked
in the upper triangle. Along the diagonal, univariate distri-
bution plots with KDE were drawn to show the marginal
distribution of the features in each column based on the
coal lithotype.
A few observations can be made from the KDE plots
along the diagonal. The first thing observed is that, for
each feature, the central peaks for different coal lithotypes
locate at different locations, indicating the difference in
values on average. Second, all the features for BC have nar-
rower distributions and higher peaks than those of BBC,
indicating that the BC features have lower variation and
are more concentrated. Third, for each feature, there are
some overlapped areas in the KDE plots between BC and
BBC. Normally, the overlapped area represents the data
range that is difficult to separate. However, due to the dif-
ference in central peak locations and the distribution, the
overlapped areas are not large, especially for dissimilarity,
correlation, and contrast. This indicates that these three are
the most important features for the classification of BC and
BBC images. At the same time, it can be found from the
KDE plots and the scatter plot that two groups of data can
be easily separated when these three features are involved.
However, dissimilarity and contrast show a correlation coef-
ficient of 0.99, indicating that they are highly correlated,
and it is reasonable to keep only one. As regards to other
features, taking energy and homogeneity as examples, the
plots show that the BC features are completely covered by
the BBC features. However, the BC features have narrower
distributions than the BBC, and the data outside the BC
data range can be classified as BBC. For example, when the
homogeneity value is larger than 0.1 or the energy value is
larger than 0.02, the image can be easily classified as BBC.
Figure 5. Box plots of various GLCM features with varying patch sizes
Previous Page Next Page