5
D. Evaluation Metrics
To analyze the performance of the predictions of the net-
work, a Bayesian approach [20] is used.
||P Nh P R
P P
P
P
P #
=^R
^R
^N
^Nh h
h (2)
||P^N P RPh
P P^Rh
Rh
P
#
=
^RPh
^R (3)
where P(Rp|N) is the probability of a pixel being predicted
as Rock given that it is Not Rock. This is also called a False
Positive. P(Rp) is the probability of a pixel being predicted
as Rock. P(N) is the probability of a pixel in the test image
belonging to the Not Rock classification. P(N|Rp) is the
probability of a pixel being Not Rock given that it was pre-
dicted to be Rock. P(Np|R) is the probability of a pixel
being predicted as Not Rock given that it is Rock. This is
also called a False Negative.
A confusion matrix is computed for each image. This
allows for the categorization of each pixel into True Positive
(TP), True Negative (TN), False Positive (FP), and False
Negative (FN). This is appropriate for a binary classifica-
tion task as the eventual goal is to autonomously identify
the appropriate areas in the rock to drill.
The accuracy of the eventual prediction can be com-
puted as
Accuracy Number of Pixels
True Positive True Negative
=
+
(4)
DATA SET COLLECTION
The sensors used are two iniVation DAVIS (Dynamic and
Active-pixel Vision Sensor) event cameras and one Intel
RealSense L515 (Lidar-Camera). These are mounted rig-
idly as shown in Figure 9.
The pipeline was evaluated by collecting both train-
ing and testing data in the Edgar Mine in Idaho Springs,
Colorado. The hypothesis is that semantic masks can be
predicted for event-based images in severe environmental
conditions during drilling activities.
A. Extrinsic and Intrinsic Calibration
The sensors are calibrated using computer vision tech-
niques to compute the relative positions of the sensor axes.
From this, we can get the distance between the stereo event
cameras as well as the homogeneous transform between the
L515 color optical frame and each of the event cameras’
frames. This is the result of the extrinsic calibration [21].
The relative position of each sensor is computed with refer-
ence to the geometric center of the mobile rig using pose
computation using a fiducial marker [22] as in Figure 10.
Homogenous transforms can be chained together to
map a depth image from the Lidar to be centered around
each of the event camera axes. This is used as the ground-
truth depth image, from the perspective of the event cam-
eras, E I.
T I I E E
L
L #=(5)
where LI is the depth image taken by the LiDAR, EI is the
ground-truth depth image from the perspective of each of
the event cameras, and ETL is the transform between the
Figure 8. Shows the result of predicting both semantic and
color masks using the network trained and validated on the
color data set
Figure 9. Shows the sensor rig used during mobile data
collection operations in the mine
D. Evaluation Metrics
To analyze the performance of the predictions of the net-
work, a Bayesian approach [20] is used.
||P Nh P R
P P
P
P
P #
=^R
^R
^N
^Nh h
h (2)
||P^N P RPh
P P^Rh
Rh
P
#
=
^RPh
^R (3)
where P(Rp|N) is the probability of a pixel being predicted
as Rock given that it is Not Rock. This is also called a False
Positive. P(Rp) is the probability of a pixel being predicted
as Rock. P(N) is the probability of a pixel in the test image
belonging to the Not Rock classification. P(N|Rp) is the
probability of a pixel being Not Rock given that it was pre-
dicted to be Rock. P(Np|R) is the probability of a pixel
being predicted as Not Rock given that it is Rock. This is
also called a False Negative.
A confusion matrix is computed for each image. This
allows for the categorization of each pixel into True Positive
(TP), True Negative (TN), False Positive (FP), and False
Negative (FN). This is appropriate for a binary classifica-
tion task as the eventual goal is to autonomously identify
the appropriate areas in the rock to drill.
The accuracy of the eventual prediction can be com-
puted as
Accuracy Number of Pixels
True Positive True Negative
=
+
(4)
DATA SET COLLECTION
The sensors used are two iniVation DAVIS (Dynamic and
Active-pixel Vision Sensor) event cameras and one Intel
RealSense L515 (Lidar-Camera). These are mounted rig-
idly as shown in Figure 9.
The pipeline was evaluated by collecting both train-
ing and testing data in the Edgar Mine in Idaho Springs,
Colorado. The hypothesis is that semantic masks can be
predicted for event-based images in severe environmental
conditions during drilling activities.
A. Extrinsic and Intrinsic Calibration
The sensors are calibrated using computer vision tech-
niques to compute the relative positions of the sensor axes.
From this, we can get the distance between the stereo event
cameras as well as the homogeneous transform between the
L515 color optical frame and each of the event cameras’
frames. This is the result of the extrinsic calibration [21].
The relative position of each sensor is computed with refer-
ence to the geometric center of the mobile rig using pose
computation using a fiducial marker [22] as in Figure 10.
Homogenous transforms can be chained together to
map a depth image from the Lidar to be centered around
each of the event camera axes. This is used as the ground-
truth depth image, from the perspective of the event cam-
eras, E I.
T I I E E
L
L #=(5)
where LI is the depth image taken by the LiDAR, EI is the
ground-truth depth image from the perspective of each of
the event cameras, and ETL is the transform between the
Figure 8. Shows the result of predicting both semantic and
color masks using the network trained and validated on the
color data set
Figure 9. Shows the sensor rig used during mobile data
collection operations in the mine