1
24-091
Using Event-Based Imaging and Deep Learning to Generate 3D
Surface Maps for Autonomous Roof Bolting
Rik Banerjee and Andrew J. Petruska
M3 Robotics Lab
Colorado School of Mines Golden CO, USA
ABSTRACT
This study explores implementing a machine learning-
based system to generate a 3D surface repre- sentation of
the roof and support straps in the mine. Event cameras have
been chosen for their performance in high-dynamic-range
lighting conditions and for their low latency. To enable
automated drilling and bolting, 3D vision using event-
based cameras has been developed. A ground-truth set is
created using two, time-synced event cameras and a LiDAR
camera. These sensors are used to construct a ground-truth
dataset of corresponding event- camera images and surface
maps from the LiDAR. The network is tested with stereo-
pairs of event images and produces a depth image with ±5
mm RMS error on average across 1000 test images.
INTRODUCTION
This paper looks to introduce the use of neuro- morphic
sensors called event cameras, to perform the 3D percep-
tion task [1] of surface representation in the harsh envi-
ronmental and lighting conditions of underground mines.
These cameras offer several advantages over traditional
cameras such as low latency, high temporal resolution, and
very high dynamic range [2]. This allows for the capture of
information in an active mining environment as it is resis-
tant to motion blur due to vibrating sensor platforms, shad-
ows due to single-point source lighting, and dust clouds.
However, this sensor is novel enough to not have an asso-
ciated data set that is both large and diverse [3]. This is
an additional hurdle to using commercial vision products
This work is supported by National Institute for Occupational
Safety &Health |NIOSH/Project 75D30121C12206.
as there simply is not enough labeled data to design and
train a generalized machine learning model. This problem
is usually solved by using simulated data [4]. Nevertheless,
simulators frequently lack the representation of visual fea-
tures commonly present in mining environments, as can be
seen in Figure 1. Additionally, the shift from simulation to
real- world conditions is recognized to be challenging [5].
The pipeline suggested in this paper serves as an alternative,
offering a quick and cost-effective means of generating a
labeled data set.
The 3D perception task requires that image points (x,
y) in pixels be re-projected into world coordinates (X, Y, Z)
in physical distance units. This can be achieved by incorpo-
rating depth by using either a depth sensor [6] or a stereo
camera pair [7], during operation. To mitigate power and
dust concerns associ- ated with active sensors, stereo-vision
has been chosen to compute the depth of each pixel.
Stereo vision takes two images taken by cameras that
have a known distance offset and outputs a dis- parity map.
Disparity is the lateral perspective shift between a pair of
corresponding pixels in the left and right images. Since the
baseline distance (distance between the cameras) is known,
the per-pixel disparity can be projected into depth using
epipolar geometry [8] as in Equation 1.
d z
b f #=
where,
d =the per-pixel disparity value,
b =the distance between the stereo cameras,
f =the focal length of the cameras used,
z =the depth of the pixel.
Previous Page Next Page