2
learning algorithms to perform the text classification task.
Machine learning is a branch of artificial intelligence that
can be used to perform tasks automatically based on train-
ing data without explicit instructions. In this study, text
classification models using supervised machine learning are
used to categorize text into organized groups. With super-
vised learning, the goal is to learn the mapping function
from input to output given many examples in the form of
input-output pairs. The input-output pairs are known as
the training data, and the outputs are specifically known as
labels or ground truth. Machine learning models that can
automatically apply labels for classification are known as
classifiers. In this paper, various machine learning models
were trained on manually classified ground-fall data from
2010 to 2019. The size of the trained dataset is about 10%
of the entire MSHA dataset between 1983 and 2021. The
best model was selected to predict the ground-fall classifica-
tion for the entire MSHA dataset. Additionally, an interac-
tive/dynamic Dashboard was built to display ground-fall
trends in U.S. coal mines between 1983–2021. Several
measures of potential ground-fall risk are included in the
Dashboard, such as the number of injuries, fatalities, and
lost workdays as an index to show the relative level of risk
among different operations and conditions in U.S. coal
mines. These metrics can be used to determine both the
problem areas and the progress in preventing work-related
injuries.
MSHA DATABASE
The dataset used in this study is the accident/injury/illness
data which can be found and/or downloaded from CDC
website (NIOSH mining data, 2023). For a detailed defini-
tion of a mine accident, see 30 CFR § 50.2 (MSHA, 1986).
The accident/injury/illness data includes 60 variables (col-
umn names) and 711,960 reported events/incidents (rows)
between 1983 and 2021. Under Part 50 of the U.S. Code of
Federal Regulations, mine operators and independent con-
tractors are required to file MSHA Form 7000-1 for report-
able incidents within 10 working days after the accident or
injury, or 10 working days following the illness diagnosis
(NIOSH, 2016). MSHA defines “reportable” as accidents,
occupational injuries, or occupational illnesses including all
incidents that require medical treatment or result in death,
loss of consciousness, inability to perform all job duties on
any workday following the injury, or temporary assignment
or transfer to another job. By regulation, certain accidents
and injuries are immediately reportable to MSHA withing
15 minutes of their occurrence (NIOSH, 2016).
Among the immediately reportable incidents are a
death or an injury of an individual that has a potential to
cause death, an unplanned roof or rib fall in active work-
ing zone that impairs ventilation, and an unplanned roof
fall at or above anchorage horizon in active workings where
roof bolts are in use (MSHA, 1986). MSHA classifies
the reported incidents into twenty-eight different catego-
ries. Figure 1 shows the classification and the number of
reported incidents in U.S. coal and metal/nonmetal mines,
both surface and underground between 1983 and 2021.
GROUND-FALL INCIDENTS AND
NARRATIVES
The ground-fall-related incidents can be classified into two
groups: the first group is fall of roof, back, or brow (from
in place), while the second group is fall of face, rib, pillar,
side, or highwall (from in place). The category called “fall-
ing, rolling, or sliding rock or material of any kind” is not
considered a ground-fall category, which is why it is not
included in the analysis. About 11% of the reported inci-
dents to MSHA occurred due to ground falls see Figure 1.
In this study, the authors focused only on the ground-fall
incidents that occurred in coal mines and excluded those
from metal and nonmetal mines because most of the
ground-fall incidents occurred in coal mines (Rashed et al.
2022). Classifying ground-fall incidents based on the main
cause and determining injuries and fatalities associated
with each category can help identify areas where additional
research is needed and where innovative solutions need to
be developed to reduce these potential occupational haz-
ards. Table 1 shows examples of some ground-fall narra-
tives and the classifications associated with them. It would
be time consuming to investigate every narrative associated
with every reported incident, which is why machine learn-
ing models were explored in this study to conduct text clas-
sification for ground-fall narratives.
Ground-fall narratives in the MSHA dataset are con-
sidered to be unstructured data, meaning they do not have
a predefined format or organization that makes it more dif-
ficult sometimes to be processed by a human or machine
learning algorithm. Additionally, some ground-fall nar-
ratives are unclear and even human-based classification is
difficult. Table 2 shows examples of these unclear ground-
fall narratives. The narratives are shown as they exist in the
MSHA dataset without modifying or editing. It is recom-
mended that future narratives follow a certain structure or
a template and use key phrases to distinguish between the
groups.
learning algorithms to perform the text classification task.
Machine learning is a branch of artificial intelligence that
can be used to perform tasks automatically based on train-
ing data without explicit instructions. In this study, text
classification models using supervised machine learning are
used to categorize text into organized groups. With super-
vised learning, the goal is to learn the mapping function
from input to output given many examples in the form of
input-output pairs. The input-output pairs are known as
the training data, and the outputs are specifically known as
labels or ground truth. Machine learning models that can
automatically apply labels for classification are known as
classifiers. In this paper, various machine learning models
were trained on manually classified ground-fall data from
2010 to 2019. The size of the trained dataset is about 10%
of the entire MSHA dataset between 1983 and 2021. The
best model was selected to predict the ground-fall classifica-
tion for the entire MSHA dataset. Additionally, an interac-
tive/dynamic Dashboard was built to display ground-fall
trends in U.S. coal mines between 1983–2021. Several
measures of potential ground-fall risk are included in the
Dashboard, such as the number of injuries, fatalities, and
lost workdays as an index to show the relative level of risk
among different operations and conditions in U.S. coal
mines. These metrics can be used to determine both the
problem areas and the progress in preventing work-related
injuries.
MSHA DATABASE
The dataset used in this study is the accident/injury/illness
data which can be found and/or downloaded from CDC
website (NIOSH mining data, 2023). For a detailed defini-
tion of a mine accident, see 30 CFR § 50.2 (MSHA, 1986).
The accident/injury/illness data includes 60 variables (col-
umn names) and 711,960 reported events/incidents (rows)
between 1983 and 2021. Under Part 50 of the U.S. Code of
Federal Regulations, mine operators and independent con-
tractors are required to file MSHA Form 7000-1 for report-
able incidents within 10 working days after the accident or
injury, or 10 working days following the illness diagnosis
(NIOSH, 2016). MSHA defines “reportable” as accidents,
occupational injuries, or occupational illnesses including all
incidents that require medical treatment or result in death,
loss of consciousness, inability to perform all job duties on
any workday following the injury, or temporary assignment
or transfer to another job. By regulation, certain accidents
and injuries are immediately reportable to MSHA withing
15 minutes of their occurrence (NIOSH, 2016).
Among the immediately reportable incidents are a
death or an injury of an individual that has a potential to
cause death, an unplanned roof or rib fall in active work-
ing zone that impairs ventilation, and an unplanned roof
fall at or above anchorage horizon in active workings where
roof bolts are in use (MSHA, 1986). MSHA classifies
the reported incidents into twenty-eight different catego-
ries. Figure 1 shows the classification and the number of
reported incidents in U.S. coal and metal/nonmetal mines,
both surface and underground between 1983 and 2021.
GROUND-FALL INCIDENTS AND
NARRATIVES
The ground-fall-related incidents can be classified into two
groups: the first group is fall of roof, back, or brow (from
in place), while the second group is fall of face, rib, pillar,
side, or highwall (from in place). The category called “fall-
ing, rolling, or sliding rock or material of any kind” is not
considered a ground-fall category, which is why it is not
included in the analysis. About 11% of the reported inci-
dents to MSHA occurred due to ground falls see Figure 1.
In this study, the authors focused only on the ground-fall
incidents that occurred in coal mines and excluded those
from metal and nonmetal mines because most of the
ground-fall incidents occurred in coal mines (Rashed et al.
2022). Classifying ground-fall incidents based on the main
cause and determining injuries and fatalities associated
with each category can help identify areas where additional
research is needed and where innovative solutions need to
be developed to reduce these potential occupational haz-
ards. Table 1 shows examples of some ground-fall narra-
tives and the classifications associated with them. It would
be time consuming to investigate every narrative associated
with every reported incident, which is why machine learn-
ing models were explored in this study to conduct text clas-
sification for ground-fall narratives.
Ground-fall narratives in the MSHA dataset are con-
sidered to be unstructured data, meaning they do not have
a predefined format or organization that makes it more dif-
ficult sometimes to be processed by a human or machine
learning algorithm. Additionally, some ground-fall nar-
ratives are unclear and even human-based classification is
difficult. Table 2 shows examples of these unclear ground-
fall narratives. The narratives are shown as they exist in the
MSHA dataset without modifying or editing. It is recom-
mended that future narratives follow a certain structure or
a template and use key phrases to distinguish between the
groups.